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ABSTRACT 

We consider the problem of finding equivalent minimal- 
size reformulations of SQL queries in presence of embed- 
ded dependencies [T]. Our focus is on select-project-join 
(SPJ) queries with equality comparisons, also known as 
safe conjunctive (CQ) queries, possibly with grouping 
and aggregation. For SPJ queries, the semantics of the 
SQL standard treat query answers as multisets (a.k.a. 
bags), whereas the stored relations may be treated ei- 
ther as sets, which is called bag-set semantics for query 
evaluation, or as bags, which is called bag semantics. 
(Under set semantics, both query answers and stored 
relations are treated as sets.) 

In the context of the above Query- Reformulation Prob- 
lem, we develop a comprehensive framework for equiva- 
lence of CQ queries under bag and bag-set semantics in 
presence of embedded dependencies, and make a num- 
ber of conceptual and technical contributions. Specif- 
ically, we develop equivalence tests for CQ queries in 
presence of arbitrary sets of embedded dependencies 
under bag and bag-set semantics, under the condition 
that chase |10| under set semantics (set-chase) on the 
inputs terminates. We also present equivalence tests for 
aggregate CQ queries in presence of embedded depen- 
dencies. We use our equivalence tests to develop sound 
and complete (whenever set-chase on the inputs termi- 
nates) algorithms for solving instances of the Query- 
Reformulation Problem with CQ queries under each of 
bag and bag-set semantics, as well as for instances of 
the problem with aggregate queries. 

Some of our results are of independent interest. In 
particular, it is known that constraints that force some 
relations to be sets on all instances of a given database 
schema arise naturally in the context of sound (i.e., cor- 
rect) chase [91 under bag semantics. We develop a for- 
mal framework for defining such constraints as embed- 
ded dependencies, provided that row (tuple) IDs, com- 
monly used in commercial database-management sys- 
tems, are defined for the respective relations. 

We also extend the condition of 4J for bag equivalence 
of CQ queries, to those cases where some relations are 
set valued in all instances of the given schema. Our 
proof of this nontrivial result includes reasoning involv- 
ing bag (non)containment. In particular, we provide 
an original proof (adapted to our context) of the result 
of [4] that CQ query Qi is bag contained in CQ query 
Q2 only if, for each predicate used in Qi, Q2 has at least 



as many subgoals with this predicate as Qi does. 

Our contributions are clearly applicable beyond the 
Query-Reformulation Problem considered in this pa- 
per. Specifically, the results of this paper can be used 
in developing algorithms for rewriting CQ queries and 
queries in more expressive languages (e.g., including 
grouping and aggregation, or arithmetic comparisons) 
using views in presence of embedded dependencies, un- 
der bag or bag-set semantics for query evaluation. 

This text contains corrections to Sections 2.4 and 4 of [5]- 

1. INTRODUCTION 

Query containment and equivalence were recognized 
fairly early as fundamental problems in database query 
evaluation and optimization. The reason is, for conjunc- 
tive queries ( CQ queries) — a broad class of frequently 
used queries, whose expressive power is equivalent to 
that of select-project-join queries in relational algebra 
— query equivalence can be used as a tool in query 
optimization. Specifically, to find a more efficient and 
answer-preserving formulation of a given CQ query, it is 
enough to "try all ways" of arriving at a "shorter" query 
formulation, by removing query subgoals, in a process 
called query minimization [2J. A subgoal-removal step 
succeeds only if equivalence (via containment) of the 
"original" and "shorter" query formulations can be en- 
sured. The equivalence test of [5] for CQ queries is 
known to be NP complete, whereas equivalence of gen- 
eral relational queries is undecidable. 

In recent years, there has been renewed interest in the 
study of query containment and equivalence, because 
of their close relationship to the problem of answering 
queries using views |17| . In particular, the problem of 
rewriting relational queries equivalently using views has 
been the s ubject of exten sive rigorous investigations. 
Please see [TTl [T71 dU [53] for discussions of the state 
of the art and of the numerous practical applications of 
the problem. A test for equivalence of a CQ query to 
its candidate CQ rewriting in terms of CQ views uses 
an equivalent transformation of the rewriting to its CQ 
expansion, which (informally speaking) replaces refer- 
ences to views in the rewriting by their definitions [17\ 
[53 . Then the equivalence test succeeds if and only if the 
expansion of the rewriting is equivalent, via the equiv- 
alence test of [5], to the input query. 

Some of the investigations discussed in [TTl HZl [211 
[53] focused on view-based query rewriting in presence 
of integrity constraints (also called dependencies, see [1] 
for an overview). For a given query, accounting for the 
dependencies that hold on the database schema may in- 
crease the number of equivalent rewritings of the query 



using the given views. As a result, for a particular qual- 
ity metric on the rewritings being generated, one may 
achieve better quality of the outputs of the rewriting 
generator, with obvious practical advantages. Similarly, 
accounting for the existing dependencies in reformulat- 
ing queries in a query optimizer could result in a larger 
space of equivalent reformulations. For an illustration, 
please see Example 14. II in this paper. 

In the settings of query reformulation and view-based 
query rewriting in presence of dependencies, Deutsch 
and colleagues have developed an algorithm, called Chase 
and Backchase (ChB, see [11]) that, for a given CQ 
query, outputs equivalent minimal-size CQ reformula- 
tions or rewritings of the query. The technical restric- 
tion on the algorithm is the requirement that the pro- 
cess of "chasing" (see pTj for an overview) the input 
query under the available dependencies terminate in fi- 
nite time. Intuitively, the point of the chase in C&B is 
to use the available dependencies to derive a new query 
formulation, which can be used to check "dependency- 
aware" equivalence of the query to any candidate refor- 
mulation or rewriting by using any known dependency- 
free equivalence test (e.g., that of 2J for CQ queries). 
Under the above restriction, the C&B algorithm is sound 
and complete for CQ queries, views, and rewritings/re- 
formulations in presence of embedded dependencies, which 
arc known to be sufficiently expressive to specify all 
usual integrity constraints, such as keys, foreign keys, 
inclusion, join, and multivalued dependencies |10j . 

The above guarantees of C&B hold under set seman- 
tics for query evaluation, where both the database (stor- 
ed) relations and query answers are treated as sets. 
Query answering and rewriting in the set-semantics set- 
ting have been studied extensively in the database-theory 
literature. At the same time, the set semantics are 
not the default query-evaluation semantics in database 
systems in practice. Specifically, the expected seman- 
tics of query evaluation in the standard query language 
SQL [15j are hag-set semantics. That is, whenever a 
query does not use the DISTINCT keyword, then query 
answers are treated in the SQL standard as multisets 
(i.e., sets with duplicates, also called hags), whereas the 
database relations are assumed to be sets. 

Arguably, the default semantics of SQL are the bag se- 
mantics, where both query answers and stored relations 
are permitted to be bags. Indeed, by the SQL standard 
stored relations are bags, rather than sets, whenever the 
PRIMARY KEY and UNIQUE clauses (which arise from the 
best practices but are not required in the SQL stan- 
dard) are not part of the CREATE TABLE statement. Us- 
ing bag semantics in evaluating SQL queri es b ecomes 
imperative in presence of materialized views |17) , where 
the definitions of some of the views may not have in- 
cluded the DISTINCT keyword, even assuming that all 
the original stored relations are required to be sets. 

The problem of developing tests for equivalence of CQ 
queries under bag and bag-set semantics was solved by 
Chaudhuri and Vardi in [4j . The bag-set-semantics test 
of [4j is also used in testing equivalence of queries with 
grouping and aggregation [8l,'55j. At the same time, de- 
veloping tests for equivalence of CQ queries under bag 
or bag-set semantics in presence of embedded depen- 
dencies has been an open problem until now. To the 
best of our knowledge, the only efforts in this direction 
have been undertaken by Deutsch in [9 and by Cohen 
in [6] , please see Section [7] for a more detailed discus- 



sion. Neither effort has resulted in equivalence tests 
for queries in presence of arbitrary sets of embedded 
dependencies, which may serve as an indication that 
the problem of developing tests for equivalence of CQ 
queries under bag or bag-set semantics in presence of 
embedded dependencies is not trivial. 

Our contributions. 

We consider the problem of finding equivalent minimal- 
size reformulations of SQL queries in presence of em- 
bedded dependencies, with a focus on select-project-join 
queries with equality comparisons, also known as safe 
CQ queries, possibly with grouping and aggregation. To 
construct algorithms that would solve instances of this 
Query- Reformulation Problem (specified in Section [3]), 
we develop a comprehensive framework for equivalence 
of CQ queries under bag and bag-set semantics in pres- 
ence of embedded dependencies, and make a number of 
conceptual and technical contributions. Specifically: 

• We formulate sufficient and necessary conditions 
for correctness ( soundness ) of chase for CQ queries 
and arbitrary sets of embedded dependencies un- 
der bag and bag-set semantics, see Section ID 

• It has been shown [9J that constraints that force 
some relations to be sets on all instances of a given 
database schema arise naturally in the context of 
sound chase under bag semantics. We develop a 
formal framework for defining such constraints as 
embedded dependencies, provided that row (tu- 
ple) IDs (commonly used in commercial database- 
management systems) are defined for the respec- 
tive relations. See Section [4] and Appendix O 

• We extend the condition of [4j for bag equivalence 
of CQ queries, to those cases where some relations 
are set valued in all instances of the given schema, 
see Sectional Our proof of this nontrivial result in- 
cludes reasoning involving bag (non)containment. 
In particular, we provide an original proof (adapted 
to our context) of the result of 4J that CQ query 
Qi is bag contained in CQ query Q2 only if, for 
each predicate used in Qi, Q2 has at least as many 
subgoals with this predicate as Qi does. 

• We show that the result Q„ of sound chase of a 
CQ query Q using a finite set E of embedded de- 
pendencies is unique under each of bag and bag-set 
semantics, whenever set-chase of Q using E termi- 
nates. We also provide a constructive characteri- 
zation of the maximal subset of S that is satisfied 
by the canonical database for Qn. See Section [5l 

• We provide equivalence tests for CQ queries in 
presence of embedded dependencies under bag and 
bag-set semantics, see Section [01 

• We present equivalence tests for CQ queries with 
grouping and aggregation in p resence of embedded 
dependencies, see Section [521 

• Finally, we develop sound and complete (whenever 
set-chase on the inputs terminates) algorithms for 
solving instances of the Query- Reformulation Prob- 
lem with CQ queries under each of bag and bag-set 
semantics, as well as instances of t he problem with 
aggregate queries, see Section [Ql 

Our contributions are clearly applicable beyond the 
Query-Reformulation Problem of Section [3l Specifi- 
cally, the results of this paper can be used in developing 
algorithms for rewriting CQ queries and queries in more 



expressive languages (e.g., including grouping and ag- 
gregation, or including arithmetic comparisons 19J) us- 
ing views in presence of embedded dependencies, under 
bag or bag-set semantics for query evaluation. Among 
other directions, our results could help solve the prob- 
lem of reformulation for XQueries with bag semantics 
on XML data. Such queries can be explicitly written 
using the keyword unordered, see [9] for a discussion. 

2. PRELIMINARIES 
2.1 The Basics 

A database schema P is a finite set of relation symbols 
and their arities. A database (instance) D over V has 
one finite relation for every relation symbol in V, of the 
same arity. A relation is, in general, bag valued; that is, 
it is a bag (also called multiset) of tuples. A bag can be 
thought of as a set of elements (the core-set of the bag) 
with multiplicities attached to each element. We say 
that a relation is set valued if its cardinality coincides 
with the cardinality of its core-set. A database instance 
is, in general, bag valued. We say that a (bag-valued) 
database instance is set valued if all its relations are set 
valued. 

A conjunctive query (CQ query) Q over a schema T) 
is an expression of the form Q{X) : — 4>{X, Y), where 
(l){X,Y) is a nonempty conjunction of atomic formulas 
(i.e., relational atoms, also called subgoals) over V. We 
follow the usual notation and separate the atoms in a 
query by commas. We call Q{X) the head and (/)(X, Y) 
the body. We use a notation such as X for a vector of 
k variables and constants Xi , . . . , Xk (not necessarily 
distinct). Every variable in the head must appear in 
the body (i.e., Q must be safe). The set of variables in 
Y is assumed to be existentially quantified. 

Given two conjunctions 4>{U) and '0(^) of atomic for- 
mulas, a homomorphism from (j){U) to V'(^) is a map- 
ping h from the set of variables and constants in U to the 
set of variables and constants in V such that (1) h{c) = c 
for each constant c, and (2) for every atom r(C/i, . . . , f7„) 
of (/), r{h{Ui), . . . , h{Un)) is in -0. Given two CQ queries 
Qi{X) : - (t>iX,Y) and Q2iX') : - i;{X' ,Y'), & con- 
tainment mapping from Qi to Q2 is a homomorphism 
h from (f>{X,Y) to i^{X_',Y') such that h{X) = X'. 

For a conjunction 0(t/) of atomic formulas, an assign- 
ment 7 for 4>{U) is a mapping of the variables of (j){U) 
to constants, and of the constants of (j){U) to them- 
selves. We use a notation such as j{X) to denote tuple 
(7(^1), . . . ,7(Xfc)). Let relation Pi in database D cor- 
respond to predicate p;. Then we say that atompi{X) is 
satisfied by assignment 7 w.r.t. database D if there ex- 
ists tuple t G Pi in D such that t = jiX). Note that the 
satisfying assignment 7 is a homomorphism from Pi{X) 
to the ground atom pi^j^X)) representing tuple t in Pi. 
Both the tuple-based definition of satisfaction and its 
homomorphism formulation are naturally extended to 
define satisfaction of conjunctions of atoms. 

Query evaluation under set semantics. For a 
CQ query Q{X) : — (piX, Y) and for a database D, 
suppose that there exists an assignment 7 for the body 
ip{X,Y) of Q, such that (j){X,Y) is satisfied by 7 w.r.t. 
D. Then we say that Q returns a tuple t = 7(A") on 
D. Further, the answer Q{D,S) to Q on a set-valued 
database D under set semantics for query evaluation is 



the set of all tuples that Q returns on D. 

Query equivalence under set semantics. Query 
Qi is contained in query Q2 under set semantics (set- 
contained, denoted Qi Q2) if Qi{D,S) C Q2{D,S) 
for every set-valued database D. Query Qi is equiv- 
alent to query Q2 under set semantics {set-equivalent, 
denoted Qi =s Q2) if Qi Q2 and Q2 Qi- A 
classical result 2^ states that a necessary and sufficient 
condition for the set-containment Qi Cs Q2, for CQ 
queries Qi and Q2, is the existence of a containment 
mapping from Q2 to Qi. This result forms the basis 
for a sound and complete test for set-equivalence of CQ 
queries, by definition of set-equivalence. 

Canonical database. Every CQ query Q can be 
regarded as a symbolic database D^'^\ jg defined 

as the result of turning each subgoal pi{. . .) of Q into a 
tuple in the relation Pi that corresponds to predicate pi. 
The procedure is to keep each constant in the body of Q, 
and to replace consistently each variable in the body of 
Q by a distinct constant different from all constants in 
Q. The tuples that correspond to the resulting ground 
atoms are the only tuples in the canonical database Z?*^*^' 
for Q, which is unique up to isomorphism. 

2.2 Bag and Bag-Set Semantics 

In this section we provide definitions for query eval- 
uation under bag and bag-set semantics. Our defini- 
tions are consistent with the semantics o f ev aluating 
CQ queries in the SQL standard (see, e.g., 115] ). as well 
as with the corresponding definitions in [il I18j . 

Query evaluation under bag-set semantics. Con- 
sider a CQ query Q{X) : — (t>{X,Y). The answer 
Q{D,BS) to Q on a set-valued database D under bag- 
set semantics for query evaluation is the bag of all tuples 
that Q returns on D. That is, for each assignment 7 for 
the body </'(X, Y) of Q, such that (l){X, Y) is satisfied 
by 7 w.r.t. D, 7 contributes to the bag Q{D, BS) a dis- 
tinct tuple t = 7(X), such that Q returns t on D w.r.t. 
7. (Le., whenever Q returns ti on D w.r.t. 71 and Q 
returns a copy t2 of ti on D w.r.t. 72 7^ 71, then each of 
ti and t2 is a separate element of the bag Q{D, BS).) 

Query evaluation under bag semantics. For a 
CQ query Q, the answer Q{D,B) to Q on a bag-valued 
database D under bag semantics for query evaluation is 
a bag of tuples computed as follows. Suppose Q is 

QiX) ■.-pi{Xi),P2{X2),...,Pn{Xn). 

Consider the vector pi, . . . ,pn of predicates (not nec- 
essarily distinct) occurring in the body of Q, and let 
Pi, . . . ,Pn be the vector of relations in D such that 
each Pi corresponds to relation Pi . Whenever two sub- 
goals pi{. . .) and pj{. . .) of Q, with i 7^ j, have the same 
predicate. Pi and Pj refer to the same relation in D. 

Let 7 be an assignment for the body of Q, such that 
the body of Q is satisfied by 7 w.r.t. D. Assignment 7 
maps each subgoal Pi{Xi) of Q into a tuple i^*-* in rela- 
tion Pi. For each i G {1, . . . ,n}, let be the number 

of occurrences of tuple i*^*) in the bag Pi. (Le., > 
is the multiplicity associated with the (unique copy of) 

tuple t'*^ in the core-set of Pi.) Then each distinct 7 
contributes exactly n"^j^mi copies of tuple t = j{X) to 
the bag Q{D, B). (Recall that X is the vector of vari- 
ables and constants in the head of Q.) Further, the bag 
Q{D,B) has no other tuples. 



2.3 Equivalence Tests for CQ Queries 

This subsection outlines equivalence tests for CQ 
queries, for the cases of bag and bag-set semantics. The 
classical equivalence test |,2 for CQ que ries for the case 
of set semantics is described in Section 12.11 

Query equivalence under bag and bag-set se- 
mantics. Query Qi is equivalent to query Q2 under 
bag semantics (bag- equivalent, denoted Qi =b Q2) if for 
all bag- valued databases D it holds that Qi{D, B) and 
Q2{D, B) are the same bags. Query Qi equivalent to 
query Q2 under bag-set semantics (bag-set-equivalent, 
Qi =BS Q2) if for all set-valued databases D it holds 
that Qi{D , BS) and Q2{D, BS) are the same bags. 

Proposition 2.1. g|/ Given two CQ queries Qi and 
Q2, Qi =B Q2 implies Qi =bs Q2, and Qi =bs Q2 
implies Qi =s Q2- ^ 

For bag and bag-set semantics, the following condi- 
tions are known for CQ query equivalence. (Query Qc 
is a canonical representation of query Q if Qc is the 
result of removing all duplicate atoms from Q.) 

Theorem 2.1. /^/ Let Q and Q' be CQ queries. 
Then (1) Q =b Q' iff Q and Q' are isomorphic. (2) 
Q =BS Q' iff Qc =B Q'cj where Qc and Q'^ are canoni- 
cal representations of Q and Q' , respectively. □ 

2.4 Dependencies and Chase 

Embedded dependencies. We consider dependen- 
cies a of the form 

a : 0(C7,T^) ^ 3V ip{U,V) 

where (j) and ■0 are conjunctions of atoms, which may 
include equations. Such dependencies, called embedded 
dependencies, are sufficiently expressive to specify all 
usual integrity constraints, such as keys, foreign keys, 
inclusion, and join dependencies |10| . If ip consists only 
of equations, then a is an equality-generating depen- 
dency (egd). If ip consists only of relational atoms, then 
(T is a tuple- generating dependency (tgd). Every set E 
of embedded dependencies is equivalent to a set of tgds 
and egds 1^ . We write _D |= E if database D satisfies all 
the dependencies in E. All sets E we refer to are finite. 

Query containment and equivalence under de- 
pendencies. We say that query Q is set- equivalent 
to query P under a set of dependencies E, denoted 
Q =i;,s P, if for every set-valued database D such that 
D ^'e we have Q{D,S) = P{D,S). The definition 
of set containment under dependencies, denoted ^s.s, 
as well as the definitions of bag equivalence and bag-set 
equivalence under dependencies (denoted by =s,b and 
=s,BS , respectively), are analogous modifications of 
the respective de finit ions for the dependency-free set- 
ting, see Sections 12.11 and 12.31 

Chase. Assume a CQ query Q{X) : — ^{X,Y) and 
a tgd a of the form (j){U, W) 3V ip{U,y). Assume 
w.l.o.g. that Q has none of the variables V. The chase 
of Q with a is applicable if there is a homomorphism 
h from to ^ and if, moreover, h cannot be extended 
to a homomorphism h! from (/i A "0 to f . In that case, 
a chase step of Q with a and h is a rewrite of Q into 
Q'{X) :- aX.Y)M,{h{U),V). 

We now define a chase step with an egd. Assume a 
CQ query Q as before and an egd e of the form (j){tJ) 
Ui = U2. The chase of Q with e is applicable if there is a 
homomorphism h from to ^ such that h{Ui) ^ h{U2) 



and at least one oih{Ui) and h{U2) is a variable; assume 
w.l.o.g. that h(Ui) is a variable. Then a chase step of Q 
with e and h is a rewrite of Q into a query that results 
from replacing all occurrences of h{Ui) in Q by h{U2). 

A Ti-chase sequence C (or just chase sequence, if E 
is clear from the context) is a sequence of CQ queries 
Qoi Qii • ■ • such that every query Qi+i (i > 0) in C is 
obtained from Qi by a chase step Qi Qi+i using a 
dependency ct G E. A chase sequence Q = Qo, Qi, • ■ • , Qi 
is terminating under set semantics if Z?('5") ^ E, where 
]j{Qn) ig i]^Q canonical database for Qn. In this case 
we say that {Q)t.,s = Qn is the (terminal) result of 
the chase. Chase of CQ queries under set semantics is 
known to terminate in finite time for a class of embed- 
ded dep endencies called weakly acyclic dependencies, 
see [13] and references therein. Under set semantics, 
all chase results for a given CQ query are equivalent in 
the absence of dependencies [Tn| . 

The following result is immediate from [1] [9l [10] . 

Theorem 2.2. Civen CQ queries Qi, Q2 and set 
E of embedded dependencies. Then Qi =i;,s Q2 iff 
(Qi)s.s =s ('32)e.s in the absence of dependencies. □ 

2.5 Queries with Grouping and Aggregation 

We assume that the data we want to aggregate are 
real numbers, R. If 5* is a set, then M{S) denotes the 
set of finite bags over S. A k-ary aggregate function is a 
function a : A^(R'^) R that maps bags of fc-tuples 
of real numbers to real numbers. An aggregate term is 
an expression built up using an aggregate function over 
variables. Every aggregate term with k variables gives 
rise to a fc-ary aggregate function in a natural way. 

We use a(y) as an abstract notation for a unary ag- 
gregate term, where y is the variable in the term. Ag- 
gregate queries that we consider have (unary or 0-ary) 
aggregate functions count, count{*), sum, max, and 
min. Note that count is over an argument, whereas 
count{*) is the only function that we consider here that 
takes no argument. (There is a distinction in SQL se- 
mantics between count and count{*).) In the rest of the 
paper, we will not refer again to the distinction between 
count and count (^), as our results carry over. 

An aggregate query [S| [3^ is a conjunctive query aug- 
mented by an aggregate term in its head. For a query 
with a fc-ary aggregate function a, the syntax is: 

Q{S,a{Y))^A{S,Y,Z). (1) 

A is a conjunction of atoms; a(Y) is a fc-ary aggregate 
term; S are the grouping attributes of Q; none of the 
variables in Y appears in S. Finally, Q is safe: all 
variables in S and Y occur in A. We consider queries 
with unary aggregate functions sum, count, max, and 
min. With each aggregate query Q as in Equation ([T]), 

we associate its CQ core Q: Q{S,Y) ^ A{S,Y,Z). 

We define the semantics of an aggregate query as fol- 
lows: Let D he a set-valued database and Q an aggre- 
gate query as in Equation ([1]) . When Q is applied on D 
it yields a relation Q{D) defined by the following three 

steps: First, we compute the bag B — Q{D,BS) on 
D. We then form equivalence classes in B: Two tuples 
belong to the same equivalence class if they agree on 
the values of all the grouping arguments of Q. This is 
the grouping step. The third step is aggregation; it as- 
sociates with each equivalence class a value that is the 



aggregate function computed on a bag that contains all 
values of the input argument (s) of the aggregated at- 
tribute(s) in this class. For each class, it returns one 
tuple, which contains the values of the grouping argu- 
ments of Q and the computed aggregated value. 

In general, queries with different aggregate functions 
may be equivalent 8'. We follow the approach of [8, 
I22j by considering equivalence between queries with the 
same lists of head arguments, called compatible queries. 

Definition 2.1. ^Equivalence of compatible ag- 
gregate queries [22]) For queries Q{X, a{y)) ^ ^{3) 
and Q'{X,a{Y)) ^ A'{S'), Q = Q' if Q{D) = Q'{D) 
for every database D . □ 

We say that two compatible aggregate queries Q and 
Q' are equivalent in presence of a set of dependencies S, 
Q =s Q', if Q{D) = Q'{D) for every database D ^ S. 



Theorem 2.3. |3 [1^ (1) Equivalence of sum- and 
of count -queries can be reduced to bag-set equivalence of 
their cores. (2) Equivalence of max- and of min-queries 
can be reduced to set equivalence of their cores. □ 

3. PROBLEM STATEMENT 

In this section we use the following notation: Let X 
be the semantics for query evaluation, with values S*, B, 
and BS^ for set, bag, or bag-set semantics, respectively. 
Let Ci and C2 be two query languages. Let S be a finite 
set of dependencies on database schema T). 

We use the notion of S-minimality [TT], defined as 
follows. (Intuitively, reformulation R of query Q is not 
S-minimal if at least one egd in E is applicable to R.) 

Definition 3.1. (IVlinimality under dependen- 
cies [llJJ A CQ query Q is S-minimal if there are no 
queries Si , S2 where Si is obtained from Q by replacing 
zero or more variables with other variables of Q, and S2 
by dropping at least one atom from Si such that Si and 
S2 remain equivalent to Q under E. □ 

We extend this definition to E-minimality of CQ 
queries with grouping and aggregation, which is defined 
as E-minimality of the (unaggregated) core of the query, 
see Section [2?5] for the relevant definitions. 

A general statement of the Query-Reformulation 
Problem that we consider in this paper is as follows: 
The problem input is (2?, X, Q, E, C2), where query Q is 
defined on database schema V in language Ci. A solu- 
tion to the Query-Reformulation Problem, for a prob- 
lem input (2?, X, Q, E, £2), is a query Q' defined in lan- 
guage £2 on V, such that Q' =s^x Q- 

In this paper we consider the Query-Reformulation 
Problem in presence of embedded dependencies, and fo- 
cus on (1) the CQ class of the problem, where each of 
£1 and £2 is the language of CQ quer ies, and on (2) 
the CQ-aggregate class (see Section 1(3. 3p . where each of 
£1 and £2 is the language of CQ queries with grouping 
and aggregation, using aggregate functions sum, max, 
min, and count] we refer to this query language as CQ- 
aggregate. For both classes, we consider only Y,-minimal 
solutions of the Query-Reformulation Problem. 

4. SOUND CHASE UNDER BAG AND BAG- 
SET SEMANTICS 

In this section we show that under bag and bag-set 
semantics, it is incorrect to enforce the set-semantics 
condition of [= E (Section 12. 4p on the terminal 



chase result Qn of query Q under dependencies S. The 
problem is that under this condition, chase may yield 
a result Qn that is not equivalent to the original query 
Q in presence of E. That is, soundness of chase, un- 
derstood as Qn =T.,B Q or Qn =T.,BS Qi may not hold. 
Wc then formulate sufficient and necessary conditions 
for soundness of chase for CQ queries and embedded 
dependencies under bag and bag-set semantics. 

In this section we also show that constraints that force 
certain relations to be sets on all instances of a given 
database schema can be defined as egds, provided that 
row (tuple) IDs are defined for the respectiv e re lations. 
Finally, we extend the condition of Theorem 12. II for bag 
equivalence of CQ queries, to those cases where some 
relations are required to be set valued in all instances 
of the given schema. Such requirements can be defined 
as our set-enforcing egds. 

4.1 Motivating Example 

L et u s conjec ture that maybe an analog of Theo- 
rem l2.2l fSection [^?^ holds for the case of bag semantics. 
(In this section we discuss in detail the case of bag se- 
mantics only; analogous reasoning is valid for the case 
of bag-set semantics.) That is, maybe Qi =s.s Q2 if 
and only if {Qi)y.,s =b {Q2)t.,s in the absence of de- 
pendencies, for a given pair of CQ queries Qi and Q2 
and for a given set E of embedded dependencies. (We 
obtain our conjectu re by replacing the symbols =e,s 
and =5 in Theorem 12.21 by the foag-semantics versions 
of these symbols.) 

Now consider the C&B algorithm by Deutsch and col- 
leagues PT|. Under set semantics for query evaluation 
and given a CQ query Q, C&B outputs all equivalent E- 
minimal conjunctive reformulations of Q in presence of 
the given embedded dependencies E (i.e., C&B is sound 
and complete) , whenever chase of Q under E terminates 
in finite time. See Appendix VK\ for the details on C&B. 

If our conjecture is valid, then a straightforward mod- 
ification of C&B gives us a procedure for solving in- 
stances in the CQ class of the Query-Reformulation 
Problem for bag semantics Q The only difference be- 
tween the original C&B and its proposed modification 
would be the test for bag, rather than set, equivalence 
(see Theorem l2.ip between the universal plan {Q)^^s of 
C&B for the input query Q and dependencies E, and 
the terminal result of chasing a candidate reformulation. 
(These terms are defined in Appendix [XI) By extension 
from C&B, our algorithm would be sound and complete 
for all problem instances where the universal plan for Q 
could be computed in finite time. 

Unfortunately, this naive extension of C&B would not 
be sound for bag semantics (or for bag-set semantics, in 
the v ersio n of C&B using the bag-set equivalence test of 
Thm. [^7T|) . We highlight the problems in an example. 

EXAMPLE 4.1. On database schema V ^ {P, R, S, 
T, U}, consider a set E that includes four tgds: 

ai : p{X, Y) s{X, Z) A t{X, V, W) 

02:p{X,Y)^t{X,Y,W) 

o^:p{X,Y)~.r{X) 

CT4 : p{X, Y) -> u{X, Z) A t{X, Y, W) 

Suppose E also includes dependencies enforcing the 
following constraints: (1) Relations S and T (but not 

^An analogous extension of C&B would work for instances 
{V, BS, Q, E, CQ), i.e., under bag-set semantics. 



R or U ) are set valued in all instances ofD; call these 
constraints as and ae, respectively. (These dependen- 
cies are relevant to the bag-semantics case. Under bag- 
set or set semantics, all relations in all instanc es of V 
are set valued by definition.) Please see Section \4.S\ for 
an approach to expressing such constraints using egds. 
(2) The first attribute of S is the key of S (egduT), and 
the first two attributes of T are the key of T ( egd erg ), 
see Appendix [BI for the definition of keys. 

Consider CQ queries Qi through Q^, defined as 

Qi{X) : - p(X, y), t{X, Y, W),s{X, Z), r{X), u{X, U). 
Q2{X) : - p{X, Y), t{X, Y, W),s{X, Z), r{X). 
QsiX) :- p{X,Y),t{X,Y,W),s{X,Z). 
Qi{X) ■.-p{X,Y). 

(We disregard queries Q2 and Q3 for the moment.) 

We can show that Qi =e,s Qi- Thus, Qi is a refor- 
mulation of Q4 in presence of S under set semantics. 
At the same time, by Q\ and Q4 are not equivalent 
under set semantics in the absence of dependencies. 

Our naive modification of C&lB would return a refor- 
mulation Qi of query Q 4. Indeed, each o/((3i)s.S o,nd 
{QaI'^.S is isomorphic to Qi, thus by Theorem \2.1\ we 
have that ((5i)s,s =B {Qi)T.,s- 

However, even though {Qi)-s,s =b {Qa)^,s, H is not 
true that Qi =s,b Qi- The counterexample is a bag- 
valued database D , D \= "E, with relations P — {{(1, 2)}}, 
R = «(!)}}, S = {{(1,3)}}, T = {{(1,2,4)}}, and U = 
{{(1, 5), (1, 6)}}. On the database D, the answer to Q4 
under bag semantics is Q4{D,B) = {{(1)}}, whereas 
Qi{D,B) = {Qih^s{D,B) = {Q4hAD,B) = {{(1), 
(1)}}. From the fact that Qi{D,B) and Q4{D,B) are 
not the same bags, we conclude that Qi ^t,,b Qi- 

The same database D (which is set valued) would dis- 
prove Qi =s.BS Qi (i.e., equivalence of Qi and Q4 
under S an d ba g-set semantics), even though it is true 
by Theorem\2J\that {Qi)^^s =BS iQi)^,S- ^ 

4.2 Sound Chase Steps 

The problem highlighted in Example 14. II is misound- 
ness of set-semantics chase when applied to query Q4 
under bag or bag-set semantics. To rectify this prob- 
lem, that is to make chase sound under these semantics, 
we modify the definitions of chase steps. 

Given a CQ query Q and a set of embedded depen- 
dencies S, let Q' be the result of applying to query 
Q a dependency cr G S. We say that the chase step 
Q ^B Q' ^s sound under bag semantics [9J (Q Q' 
is sound under bag-set semantics, respectively) if it holds 
that Q =s,B Q' (that Q =s,bs Q', respectively). By 
extension of the above definitions, all chase steps under 
embedded dependencies are sound under set semantics. 
The definitions of sound chase steps are naturally ex- 
tended to those of sound chase sequences under each se- 
mantics. We say that a chase result Qn is sound w.r.t. 
(Q, S) under bag semantics (under bag-set semantics, 
respectively) whenever there exists a E-chase sequence 
C that starts with the input query Q and ends with Qn , 
and such that all chase steps in C are sound under bag 
semantics (under bag-set semantics, respectively). 

4.2.1 Regularized Assignment-Fixing Tgds 

Toward ensuring soundness of chase under bag and 
bag-set semantic s, we w ill define key-based chase using 
tgds, see Section B. 2. 31 For our definition we will need 



the technical notions of "regularized tgds" and "assign- 
ment-fixing tgds", which we formally define and charac- 
terize in this subsection. 

Regularized tgds. 

Consider a tgd ct : (j){X,Y) 3Z il^{X,Z) whose 
right-hand side ^ has at least two relational atoms. Let 
t/iq and '06 be a partition of ijj (where -0 is viewed as set of 
relational atoms) in a into two disjoint nonempty sets, 
that is i^b ^ 0, n 0fc = 0, and U 06 = V- 

Let A be all the variables in ipa, and let B be all the 
variables in 0f,. We call tpa and ipb a nonshared partition 
0/ '0 in fJ whenever An B C X. (Recall that all the 
variables in X are universally quantified in a.) In case 
where ipa and ipb are two disjoint nonempty sets such 
that i/ja Uipb = i/j and A fl B fl Z ^ 0, we call ipa and ipb 
a shared partition of in a. 

Definition 4.1. (Regularized tgd, regularized 
set of embedded dependencies j A tgd cr : — > 

is a regularized tgd if there exists no nonshared parti- 
tion of the set of relational atoms of into two disjoint 
nonempty sets^ We say that a finite set E of embedded 
dependencies is a regularized set of (embedded) depen- 
dencies if each tgd in E is regularized. □ 

Sets {u{X, Z)} and {t{X, Y, W)} comprise a nons hare d 
partition of the right-hand side of tgd 174 in Example l4.11 
therefore , the tgd 174 is not regularized. For a tgd cti in 
Example [42 where ai : p{X, Y) -> 3Z 3W r{X, Z) A 
s{Z, W), sets {r{X, Z)} and {s{Z, W)} comprise a shared 
partition of the right-hand side of ui , because an exis- 
tential variable Z of cti is present in both elemen ts of 
the partition. This tg d is regularized by Definition 14.11 
The set E in Example l4.6l is a regularized set of depen- 
dencies. 

Consider a tgd cr : ^ that is not regularized 
by Definition 14.11 The process of regularizing a is the 
process of constructing from a a set E^- — {cti, . . . ,atf\ 
of tgds, where fc > 2 and such that for each tgd at in 
Ecr, (i) the left-hand side of ai is the left-hand side 
of (t; (ii) the right-hand side of Ui is a nonempty set 
of atoms '0i ^ (recall that is the right-hand side of 
a), with all the existential variables (of "0) in ipi marked 
as such in ai ; (iii) Ui is regularized by Definition 14.11 
and (iv) U.^Lj^^i = . It is easy to see that given a 
non-regularized tgd a, the recursive algorithm of find- 
ing nonshared partitions of the right-hand side of a (a) 
regularizes a correctly, (b) results in a unique set Eo-, 
and (c) has the complexity 0{m^ log to), where to is 
the number of relational atoms in the right-hand side 
of a. (The idea of the algorithm is to (1) give a unique 
ID id{a^) to each relational atom a^ of 0, to then (2) 
associate with each id{a^) the set of all those variables 
of a^ that are existentially quantified in cr, and to then 
(3) recursively sort all the ids, each time by one fixed 
variable in their associated variable lists, and to either 
start a new nonshared partition using the sorted list, or 
to add atoms to an existing nonshared partition, again 
using the sorted list.) We call E^. the regularized set of 
cr. 

Now given a finite set E of arbitrary embedded egds 
and tgds, we regularize E by regularizing each tgd in 

^Trivially, every tgd whose right-hand side has exactly one 
atom is a regularized tgd. 



S as described above. We say that E' is a regularized 
version of S if (i) for each egd a in E, E' also has ct, 
(ii) for each tgd a in E, E' has the regularized set of 
(T and, finally, (iii) E' has no other dependencies. For 
each E as ab ove, it is easy to see that E' is regularized 

by Definition 14. II and is unique. 

The following resu lts, in Proposition 14. li are imme- 
diate from Definition 14.11 and from the constructions in 
this subsection. 

Proposition 4.1. For a finite set T, of embedded eg ds 
and tgds defined on schema T), let E' he the regularized 
version ofT,. Then 

• For every bag-valued database D with schema V, 
D\=j:iffD^ E'; and 

• For every CQ query Q defined on V, chase of Q 
under set semantics in presence of E terminates 
in finite time iff chase of Q under set semantics 
in presence of E' terminates in finite time, and 

s =s (Q)s' s provided both chase results ex- 
ist. ' ' ° 

Assignment-fixing tgds. 

In the remainder of the paper, whenever we refer to 
a set of embedded dependencies, we assume that we are 
discussing (or using) its regularized version. We now 
define assignment-fixing tgds. The idea is to be able to 
determine easily which tgds ensure sound chase steps 
under each of bag and bag-set semantics. The intuition 
is as follows. Suppose chase with tg d a i s applicable to 
a CQ query Q as defined in Section [^T^ (i.e., assuming 
set semantics), but we are looking at the implications 
of applying the chase under bag or bag-set semantics 
rather than under set semantics. Suppose further that 
the right-hand side of a has existential variables. Then 
we would like to add subgoals to Q, that is to perform 
on Q the chase step Q Q', exactly in those cases 
where each consistent assignment to all body variables 
of Q, w.r.t. any (arbitrary) database D that satisfies 
the input dependencies, can be extended to one and 
only one consistent assignment to all body variables of 
Q' w.r.t. D. Otherwise Q would not be equivalent to Q' 
in presence of a and under the chosen query-evaluation 
semantics. It turns out that the characterization we 
are seeking is, in general, query dependent. (See Exam- 
ples SSI and O ) 

We now formalize this intuition of prohibiting, in chase, 
"incorrect" multiplicity of the answer to the given query 
in presence of the given dependencies, under bag or bag- 
set semantics. Consider a CQ query Q{A) : — ({A, B), 
and a regularized tgd a : (t){X,Y) 3Z il}{X,Z) that 
has at least one existential variable, that is Z is not 
empty. Suppose that chase of Q with a is applicable, 
using a homomorphism h from (j) to Q. We come up 
with a substitution 9 of all existential variables Z in 
the right-hand side of the tgd cr, such that 9 replaces 
each variable in Z by a fresh variable that is not used in 
any capacity (i.e., neither universally nor existentially 
quantified) in a or in C,. (Observe that 9 always exists.) 
We use h and 6 to define for Q and cr an associated test 
query Q'^'^^'^i 

Q'^^'^'^A) : - C(A, B) A i^ih{X), Z) A ^{h{X),d{Z)) . 

(2) 



Observe that for any pair (0i, ^2) of substitutions that 
satisfy the conditions on 9 above, Q'^''^'^^ and Q'^-'''^^ a,re 
isomorphic. Hence Q'^''^'^ is unique up to isomorphism 
w.r.t. 9, and we choose one arbitrary 9 for Q"'-^'^ in the 
remainder of the paper. 

We now treat the case where a has no existential vari- 
ables. In this case 9 is trivially empty, = 0, and we 
define the associated test query Q'^''^''^ for Q, a, and h 
as above as: 

Q-^^^(A):-aA,B)Ai;{h{X),Z) . (3) 

That is, Q"^'^'^ is the result of applying to the query 
Q a chase step using cr, as defined in Section [^T^ in this 
paper. 

We stress again that Equation [3] is defined only for 
those cases where cr has no existential variables. How- 
ever, Equation [3] can be obtained from Equation [5] by 
setting 9 — % and by removing duplicate subgoals from 
the body of the query in Equation[21 Therefore, in what 
follows we adopt Equation [2] as the definition of the as- 
sociated test query for Q and a regardless of whether a 
has existential variables. 

Definition 4.2. ("Associated test queryj Given a 
CQ query Q and a regularized tgd a such that chase 
using a is applicable to Q using homomorphism h, the 
associated test query for Q, a, and h is as shown in 
Equation [H □ 

We now define assignment-fixing tgds, which enable 
sound chase steps under each of bag and bag-set se- 
mantics, under an extra condition under bag seman- 
tics that all the subgoals being added in the chase step 
correspond to set- valued relations. We first ensure cor- 
rectness of the definition of assignment-fixing tgds, by 
making a straightforward observation. 

Proposition 4.2. Given a CQ query Q and a finite 
set E of tgds and egds, and for a regularized tg^ cr G E 
such that chase using a applies to Q with a homomor- 
phism h. Then the terminal chase result {Q'^''^'^)s.S 
exists whenever (Q)s,s exists. □ 

Proof. (Sketch.) Trivial for the case where a has no 
existential variables. For the remaining case, the proof 
is by contradiction. Suppose that (Q'^''''^)s,s does not 

exist, that is, the body of {Q'^'^'^)j:,s has an infinite 
number of relational subgoals, using an infinite num- 
ber of variable names. We then show that the body of 
iQ)^,s also has an infinite number of relational subgoals 
(using an infinite number of variable names), and thus 
arrive at the desired contradiction. The procedure is 
to apply to Q all the chase steps that are applicable to 
Q<y,hfi^ Specifically, for each chase step S that applies 
on Q'^''^'^ using a homomorphism /i, we apply the same 
chase step to the result Q' of chase step on Q using a 
and the h of Q'^'^'^. In each S we use the homomor- 
phism that is a composition of fi with a homomorphism 
that results from putting together the identity mapping 
(on some of ths subgoals) and 9~^, for the 9 used in 
defining Q'^'^'^ . (By definition, 9 is injective and thus 
6'""'" exists.) 

^Recall that we assume throughout the paper that E is the 
regularized version of any given set of tgds and egds. 



Observe that this "simulation" on Q' of the infinite 
chase on Q'^'^'^ cannot collapse the infinite number of 
variables in {Q'^''^'^)^,s into a finite number of variables 
(and thus into a finite number of subgoals) in the "simu- 
lation result". The reason is, the language of embedded 
dependencies cannot specify the instruction "generate 
a new variable name, using the right-hand side of the 
tgd in question, only if some variable names are not 
the same in the left-hand side of the tgd in question". 
Q.E.D. □ 

We are finally ready to define assignment-fixing gds. 

Definition 4.3. (^Assignment-fixing tgd j Gwen a 
CQ query Q and a finite set E of tgds and egds such that 
{Q)y.,s exists, let a ^ Yi he a regularized tgd with exis- 
tential variables Zi, . . . , Zk, k > 0, such that chase of Q 
with a is applicable, with associated test query Q'^'^'^ . 
Then a is an assignment-fixing tgd w.r.t. Q and h if 
{Q'^'^''^)y..s has at most one of Zi and 6{Zi) for each 
i e {1, . . . , fc}. Further, a is an assignment- fixing tgd 
w.r.t. Q if a is an assignment-fixing tgd w.r.t. Q and 
some homomorphism h. □ 

Proposition 4.3. In the setting of Definition \4.3\ 
whenever a is a full tgd (i.e., tgd without existential 
variables), then a is an assignment-fixing tgd w.r.t. all 
CQ queries Q such that chase using a is applicable to 
Q and such that ((5)s exists. □ 

Consider two illustrations of the determination whether 
a tgd with existential variab les is assignment fixing w.r.t. 
a given CQ query. Example l4.2l is a positive example, in 
that it establishes a tgd as assignment fixing, whereas 
Example 14.31 is a negative example. 

EXAMPLE 4.2. On database schema V = {P, i?, S}, 
consider a regularized set of embedded dependencies E = 
{cti, (72, 173}, where ui is a tgd, 

CTi : p{X, Y) 3Z 3W r{X, Z) A s{Z, W), 

egd (T2 establishes the first attribute of R as its superkey, 
and, finally, egd is as follows: 

(73 : r{X, Y) A s(Y, T) A r{X, Z) A s{Z, W) ^ T ^ W. 

Let CQ query Q be Q{X) : — p{X,Y). Chase using 
CTi is applicable to Q, using homomorphism h = {X —>■ 
X,Y Y}. For the query 

Q-uh,0^X) : - p{X,Y),r{X,Z),s{Z,W), 
r{X,Zi),s{Zi,Wi) . 

constructed using 6 = {Z ^ Zi,W —> Wi}, we have 

{Q^"''^'h,s{X) : - p{X, Y), r(X, Z),s{Z, W) . 

Thus, (7i is an assignment- fixing tgd w.r.t. Q, because 
the body of {Q''^'^'^)^_s{X) has only one of Z and Z\ 
and only one of W and Wi . □ 

EXAMPLE 4.3. Usi ng th e database schema and de- 
pendency (72 of Example we replace ai of that ex- 
ample with a regularized tgd (74 ; 

(74 : piX, Y) 3Z, W, T r(A, Z) A s{Z, W) A s(A, T). 

We also replace (73 of the example with egd a^, and 
add an egd oq : 



(75 : r(A, Z) A s{Z, W) A s{X, T)^W = T. 
(76 : pIx, Y) a r(A, X) A s(X, T)->X = T. 

We denote by E' the set of dependencies {02, (Ji,cF^,aQ\ . 

Consider again query Q{X) : — p(X,Y). Chase us- 
ing CT4 is applicable to Q, using the identity homomor- 
phism h. For the query 

Q^.,h,9^X) : - p{X, Y), r(A, Z), W), s{X, T) 
r(A,Zi),s(Zi,W^i),s(A,Ti) . 

constructed using 9 = {Z Zi,W Wi,T — > 
we have 

(Q-^.M)j., :- p{X,Y),r{X,Z), 

s{Z,W),s{X,W),s{Z,Wi),s{X,Wi) . 

Thus, (74 is not an assignment- fixing tgd w.r.t. Q by 
definition, because the body of {Q'^^'^'^)^i s{X) has both 
of W andWi. " □ 

4. 2. 2 Motivation for Regularized Assignment-Fixing 
Tgds 

One may w onder whether the notions introduced in 
Section 14.2.11 are justified. In this subsection we illus- 
trate that whenever a non-regularized tgds or a tgd that 
is not assignment fixing is used in chase step Q Q' , 
then the chase result Q' may be nonequivalent to Q 
under bag o r ba g-set se mant ics- 
Examples 14.41 through [4?6l establish the need for reg- 
ularized tgds and for the (traditional) definition of the 
ch ase s tep for tgds, see Section [2^ in this paper. Exam- 
ple [377] shows an unsound chase step using a regularized 
tgd that is not assignment fixing w.r.t. the query. Fi- 
nally, Example 14.81 demonstrates a sound chase step us- 
ing a regularized assignment-fixing tgd, and illustrates 
how the notion of assignment-fixing tgds is strictly more 
gener al th an that of key-based dependencies (see Defi- 
nition 15. ip . 

EXAMPLE 4.4. Consider Example \4 1\ where tgd 
(74 is not key based in presence of the set E of embedded 
dependenc ies i n the example, by the definition of see 
Definition \5. 1\ For the reader convenien ce, w e provide 
here the tgd (74 and query Q4 of Example \4-l\ 

(74 : p{X, Y) u{X, Z) A t{X, Y, W) 
Qi{X) ■.-p{X,Y). 

Now conside r the result of removing from E the tgd 
(72 of Example we denote by E' the set E' = E — 
{(72}. In presence ofYi', the tgd CT4 is still not key based. 
However, if we refrain from applying CT4 to Q4 in chase 
under bag or bag-set semantics, then we will miss the 
rewriting (of Example \4-i^ of Q4. Indeed, by the 
results of this paper it holds ttiat =s'.B Qi (^nd that 

Qs =E',BS Qi- n 

Observe that tgd (74 in Example 14.41 is not regular- 
ized, see Definition 14.11 We miss an equivalent rewrit- 
ing of the input query Q4 by refraining from applying 
the tgd. Consider now Example 14. 51 where we do apply 
the nonregularized tgd (74 in its entirety to the query 
Q4. However, instead of the query Q3, which is equiv- 
alent to Q4 in presence of E' under each of bag and 
bag-set semantics, we obtain a formulation of Q4 that 
is not equivalent to Q4 (in presence of E') under either 
semantics. 



EXAMPLE 4.5. Conside r the query Qi and set E' 
of dependencies in Example \4-4\ W^e now attempt to 
find the rewriting Qsfof Example \4-l\ ) that we failed to 
obtain in Example \4.4\ 

Q^{X) p{X,Y),t{X,Y,W),s{X,Z). 

To find the rewriting Q^, specifically to obtain its T- 
subgoal, we apply the nonregularized dependency (74 to 
the query Q4 . We denote by Q'^ the result of the appli- 
cation: 

Q\{X) ■.-p(X,Y),t{X,Y,W),u(X,Z). 

Recall from Example \4-l\ that in presence of E', re- 
lation U does not have superkeys other than the set of 
all its attributes. Using this information, we construct 
a database D that is a counterexample to equivalence of 
and Q'^ in presence of E' and under bag-set seman- 
tics. (Thus, by definition, D is also a counterexample 
to the equivalence of the queries in presence of E' and 
under bag semantics as well). 

Let D = {P(1,2),T(1,2,3),C/(1,4),C/(1,5)}. (Ob- 
serve that D is a set-valued database and that D \= 
T,'.) On database D, Qi{D,BS) ~ {{ (1) }}, whereas 
Q',{D,BS)={{{1),{1)}}. □ 

Note 1 on Example \4.5\ The problem with applying 
(74 to query (34 in the example is that (T4 is not regular- 
ized. The regularized set for CT4 is {a'^,a'(}, where 

ai:p{X,Y)^t{X,Y,W) 
a'l:p{X,Y)^u{X,Z) 

Observe that tgd cr^ is assign ment fixing in presence of 
(the egds in) E' (of Example 14. 4p . whereas cr" is not. 
Thus, cr" cannot be applied in sound chase of Q4 using 
E' under bag or bag-set semantics, by our main results 
of this section. Using the regulariz ed ve rsion of E' (this 
version also replaces ai of Example 1 4 . II wit h its regular- 
ized set), we can perform sound chase Q4 to obtain the 
above query Qs, which is equivalent to in presence 
of E' under each of bag and bag-set semantics (with the 
usual restriction of set-valued relations in the case of 
bag semantics). 

We now examine the modified definition of chase, see 
Section 2.4 of [5J. Indeed, using that definition we ob- 
tain correctly the terminal chase results of the query 
Qi in Example 14.11 even though not all input tgds are 
regularized, see Examples 4.1 and 5.1 of [5]. However, 
as we see in the next example, using the modified def- 
inition of chase does not result in sound chase (under 
bag or bag-set semantics) for all problem inputs. 

EXAMPLE 4.6. Consider query Q and set Yi = {vi, 
V2 } of dependencies, where 

Q{X) p{X,Y),s{X,Z) 

vi : p{X, Y) 3Z s{X, Z) A t{Z, Y) 

V2 : t{X, Y) A t{Z, Y)-^ X = Z 

Observe that v\ is a regularized tgd and is also assign- 
ment fixing, w.r.t. Q, by our definitions in this section. 
We now apply modified chase as defined in Section 2.4 
of 0/ and obtain query Q' : 

Q'{X) ■.-p{X,Y),s{X,Z),t{Z,Y). 



We show nonequivalence of Q to Q' in presence of E 
under each of bag and bag-set semantics, by constructing 
a database D that is a counterexample to either equiva- 
lence. Indeed, let D = {P(l, 2), S{1, 1), S{1, 3), T(3, 2)}. 
( Observe that D is a set-valued database and that D \j= 
E.; On database D, Q{D,BS) = {{ (1), (1) }}, whereas 

Q'{D,BS)^{{m- □ 

Note on Example \4.6\ The application of vi to Q 
in the example is sound by the (incorrect) definition 
of key-based chase steps in [5j. Still, the application 
of the regularized and assignment-fixing tgd vi using 
the modified definition of the chase step does result in 
unsound chase as shown in Example 14.61 

We now show an example of using a regularized but 
not assign ment fixing tgd in a (traditional) chase step, 
see Section [231 in this paper for the definition. 

EXAMPLE 4.7. Recall the database schema V = 
{P, R, S} and dependencies E' = {172, cr4,<75} of Ex- 
ample \4.^ 

CT2 : r{X, Y) A r{X, Z) ^ Y = Z . 

(74 : pIx, Y) 3Z, W, T r{X, Z) A s{Z, W) A s{X, T) . 

CT5 : rlx, Z) A s{Z, W) A s{X, T) ^ W = T . 

Recall that CT4 is regularized but not assignme nt fix ing 
w.r.t. query Q{X) : — p{X,Y); see Example \4.3\ for 
the details. We apply the chase step using tgd (74 to Q, 
to obtain the result Q" : 

Q{X) :-p{X,Y) . 

Q"{X) : - p{X, Y),r{X, Z), s{Z, W),s{X, T) . 

To construct a counterexample to equivalence of Q 
and Q" in presence of E', under each of bag and bag- 
set sem antics, we use the query (Q'^'*''''^)s',s of Exam- 
ple \4.3\ Specifically, we use as a counterexample the 
canonical database, call it D, of {Q'^'^''^'^)s\s ! we have 
that D is set valued and that D \= Yi' by definition of 
the query {Q^^^^'^t.' .S- 

Consider the database D = {P(l, 2), i?(l, 3), 5(1, 4), 
5(1, 5), 5(3, 4), 5(3, 5)}. (Recall that the canonical database 
of a CQ query is isomorphic up to choice of constants.) 
We have thatQ{D,BS) ={{(!)}}, whereas Q'{D,BS) = 
{{(1),(1),(1),(1)}}. □ 

By our main results in this section, for the Q, E, and 
vi of Example 14. 6i the application of vi to Q (using the 
trad ition al definition of chase steps using tgds, see Sec- 
tion 12.41 in this paper) is sound in presence of E under 
each of bag and bag-set semantics (provided that for the 
case of bag semantics, both 5 and T are set-v alue d rela- 
tions in all instances of {P, 5, T}). Example 14.81 shows 
the chase step. 

EXAMPLE 4.8. Consider the query Q and set E = 
{i^i, 1^2} of dependencies of Example \4.()\ Recall that vi 
is a regularized tgd and is also assignment fixing w.r.t. 
Q in presence of the egds of E. We n ow apply (tradi- 
tional) chase as defined in Section \2.4\ in this paper, to 
obtain query Q" : 

Q"(X) : - p{X, Y),s{X, Z), s(X, W), t{W, Y). 

The difference from our application of vi in Exam- 
ple \4.(!\ is that we now add a new S-subgoal in addition 
to a new T-subgoal. By the definition of chase steps us- 
ing tgds, the second attribute of S must be denoted by 
different variables in the two S-subgoals in query Q" . □ 



Note on Example \4-8\ Recall that vi in the example 
is assignment fixing w.r.t. the query, and thus by our 
results can be applied in sound chase under bag and 
bag-set semantics (provided that for the case of bag 
semantics, both 5* and T are set-valued relations in all 
instances of {P, S, T}). At the same time, J^i is n ot key- 
based by the definition of 0, see Definition 15. II in this 
paper. The problem is with the S'-atom of z^i, which is 
not key based in presence of S by Definition 15.11 

4.2.3 Assignment-Fixing Chase 

We begin the exposition of the main results of this 
section by defining assignment- fixing chase steps using 
tgds. 

Definition 4.4. fAssignment-fixing chase step 
using tgd ) Let a be a regularized tgd in a finite set 
S of embedded dependencies on schema T). Consider a 
CQ query Q defined on T), such that {Q)t.,s exists and 
such that a is applicable to Q. Then the chase step that 
applies a to Q is an assignment-fixing chase step using 
a whenever a is an assignment- fixing tgd w.r.t. Q. □ 

We now provide necessary and sufficient conditions 
for soundness of chase steps under bag semantics for 
query evaluation. 

Theorem 4.1. Civen a CQ query Q and a set of em- 
bedded dependencies S on schema T>. Under bag se- 
mantics, a chase step Q Q' using tr G E is sound 

1. Q Q' is a (tgd) assignment- fixing chase step, 
and for each subgoal s{pij) that the chase step adds 
to Q, relation Pij is set valued on all databases 
satisfying S; or 

2. In Q Q' , a is an egd; in this case, duplicates of 
subgoal s{p) in Q' can be removed only if relation 
P is set valued in all instances ofD. □ 

In Section [4.2.3l Example 14 . 71 shows an unsound chase 
step using a regularized tgd that i s no t assignme nt fix- 
ing w.r.t. the query. Example 14.81 in Section 14.2.31 
demonstrates a sound (by Theorem 14. 1[) chase step us- 
ing a regularized assignment-fixing tgd, provided that 
both S and T are set-valued relations in all instances 
of the database schema used in the example. Relaxing 
this set-valued requirement would result in an unsound 
chase step using the same tgd, as is easy to demonstrate 
using a counterexample bag-valued database. 

The requirement that certain stored relations be set 
valued arises naturally if one seeks soundness of bag- 
semantics chase, see [9|. We now show that constraints 
that force certain relations to be sets on all instances of 
a database schema can be formally defined as egds, pro- 
vided that row ( tuple ) IDs are defined for the respective 
relations. In the common practice of using tuple IDs in 
database systems, each tuple in a (bag-valued) relation 
is assigned a unique tuple ID. Then the set-enforcing 
egd on relation P can be expressed as a functional de- 
pendency {fd, defined in Appendix |B]) , which specifies 
that whenever two tuples of P agree on everything ex- 
cept the tuple IDs, then the tuples must also agree on 
the tuple IDs. Please see Appendix [Cl for the details of 
our set-enforcing framework based on tuple IDs. 

* Recall that we consider only finite regularized sets of de- 
pendencies throughout this paper. 



We now discuss item 2 of Theorem 14.11 Given a 
database schema V, suppose that for some of the re- 
lation symbols {Pi, . . . ,Pk} QT> \t holds that the rela- 
tion for each of Pi , . . . , Pfc is required to be set valued 
in all instances D over V. For s uch scenarios, the bag- 
equivalence test of Theorem l2.1l is no longer a necessary 
condition for bag equivalence of CQ queries. 

EXAMPLE 4.9. By Theorem\K^ query Q3 of Ex- 
ample pT7] is not bag equivalent to query Q^: 

Q^iX) : - p{X, Y), t{X, Y, W),s{X, Z), s(X, Z). 

Here, the only difference between and is the extra 
copy of subgoal s{X,Z) in Q5. At the same time, Q3 
and Qs are bag equivalent on all bag-valued databases 
wher e rela tion S is required to be a set. Please see The- 
orem \4^ md AvvendixWl for the details. □ 

We now formulate the extended sufficient and neces- 
sary condition. Please see Appendix [D] for the proof. 

Theorem 4.2. Let {Pi, . . . ,Pk} C V be the maxi- 
mal set of relation symbols in schema V such that the 
relation for each of Pi, . . . , Pk is required to be set val- 
ued in all instances D over T>. Civen CQ queries Qi, 
Q2 on T), let query Qi (Q2, respectively) be obtained 
by removing from Qi (from Q2, respectively) all dupli- 
cate subgoals whose predicates correspond to Pi, . . . , P^. 
Then Qi =b Q2 in the absence of all dependencies other 
than the set-enforcing dependencies on Pi, . . . , Pk of the 
schema V if and only if Q[ and Q'2 are isomorphic. □ 

The correctness of the duplicate-removal rule of item 
2 in Theorem l4.1l is immediate from Theorem 14.21 

We now spell out the necessary and sufficient condi- 
tions for soundness of chase steps under bag-set seman- 
tics for query evaluation. 

Theorem 4.3. Civen a CQ query Q and a set of em- 
bedded dependencies S. Under bag-set semantics, a 
chase step Q Q' using cr G E is sound iff 

1. Q =>'^ Q' is a (tgd) assignment- fixing chase step; 
or 

2. In Q =>°" Q' , (7 is an egd. □ 

We use Examples 14.71 and of Se ction to make 
here the same points as for Theorem 14. II Observe that 
(unlike the case of bag semantics) the set-valuedness re- 
quirement is satisfied by definition of bag-set semantics. 
See Example 14. II for query Q2 that is obtained from Qii 
by using, among other sound chase st eps, a chase step 
involving dependency (73. By Theorem 14.1) (T3 may not 
be used in sound chase under bag semantics, because 
relation S is not guaranteed to be set valued in all in- 
stances of the database schema of the example. 

Proof. (Theorems HIT] and 14. 3( sketch.) We out- 
line here the correctness proof for chase steps using 
tgds. Please see Appendix [E] for the details of dis- 
proving soundness of chase steps under bag semantics 
whenever chase (using even regularized and assignment- 
fixing tgds) adds query subgoals whose associated base 
relations are not set valued in all instances of the given 
database schema. 

^Recall that we consider only finite regularized sets of de- 
pendencies throughout this paper. 



Consider a CQ query Q and a set of dependencies E 
defined on schema P, such that [Q)y. exists. Let cr S S 
be a regularized dependency such that chase using a 
appUes to Q (using a homomorphism h) and resuhs in 
query Q' . (That is, Q Q' is defined.) Further, 
suppose that for all subgoals that are in Q' but not in 
Q, the respective base relations, call them collectively 
iS C 2?, are set valued in all instances of the schema V. 

Case (1): Let a be an assignment-fixing tgd w.r.t. the 
query Q. We prove that on all instances D of V such 
that D \= T, and such that at least the relations in S 
are set valued on D, it holds that Q{D, B) = Q'{D, B) 
and that Q{D, BS) = Q'{D, BS). (Thus, the chase step 
Q Q ' is sound under the conditions of Theorems l4.1l 
and SSI) 

We fix an arbitrary database D as described above. 
The idea of the proof is to establish a 1:1 correspondence 
between all the assignments satisfied by Q w.r.t. D and 
all the assignments satisfied by Q' w.r.t. D. As a result 
(and using the fact that the 5-part of the base relations 
in D is guaranteed to be set valued) , we obtain that for 
each tuple t G Q{D,B), such that the multiplicity of t 
in Q{D,B) is TO > 0, the multiplicity of t in Q'{D,B) 
is also TO. 

We establish the 1:1 correspondence as follows. 

(i) For each assignment /i' that satisfies Q' w.r.t. D, 
there exists exactly one assignment /i that (a) sat- 
isfies Q w.r.t. D, and that (b) coincides with /i' 
on the set of body variables of Q. (Recall that a 
is a tgd, and therefore the set of body variables of 
Q' is a superset of the set of body variables of Q.) 

(ii) For each assignment ^ that satisfies Q w.r.t. D, 
there exists at least one assignment fj,' that (a) sat- 
isfies Q' w.r.t. D, and that (b) coincides with ^ on 
the set of body variables of Q. This is immediate 
from the fact that I? |= S. 

(iii) From the fact that a is assignment fixing w.r.t. Q, 
we obtain that for each fi as in (ii) there exists at 
most one corresponding /i' as in (ii). Indeed, sup- 
pose that for some such /i there exist at least two 
assignments fx'i and /ij that satisfy the conditions 
of (ii). Then we show by obtaining the chase result 

{Q"'''^'^)t,.s, in Definition l4.3[ that fi[ and must 
be identical on all databases satisfying E. 

The observation that D is an arbitrary database satis- 
fying the conditions above concludes the proof of Q =^^b 
Q' in this case (1). Further, Q ^t.,bs Q' is immediate 
from Q =s,B Q' ■ 

Case (2): Let a not be assignment fixing w.r.t. the 
query Q. We construct a set-valued database D (with 
schema V) such that D \^Yj and such that Q{D, BS) ^ 
Q'{D,BS). (As a result, neither of Q =s,s Q' and 
Q =12, BS Q' holds, and therefore the chase step Q 
Q' is not sound in this case under bag or bag-set seman- 
tics.) 

As a counterexample database D we use the canonical 
database of the query (Q'^''''^)s,s, see Definition 14.31 
Example 14.71 illustrates the construction. 

Let v be the satisfying (by definition of canonical 
databases and by definition of chase under set seman- 
tics) assignment to the head variables X of Q"''^'^ w.r.t. 



the database D. Observe that the vectors of head vari- 
ables of all of Q, Q', and Q"''^-^ are the same by defi- 
nition of Q'^''^'^. By definition of Q'^'^'^, there exists an 
extension i^q of v to all the body variables of Q such 
that vq satisfies Q w.r.t. D, and there exists an exten- 
sion i'q, of v to all the body variables of Q' such that 
Vq, satisfies Q' w.r.t. D. 

We make the following observations about the an- 
swers to Q and Q' under bag-set semantics on the set- 
valued database D. 

(i) For each assignment /i' such that fi'\x — and 
such that fjl satisfies Q' w.r.t. D (we have shown 
that there exists at least one such assignment /x'), 
there exists exactly one assignment /i that (a) sat- 
isfies Q w.r.t. Z?, and that (b) coincides with /z' 
on the set of body variables of Q. (See (i) under 
case (1) of the proof.) Observe that — v hj 
definition of /i. 

(ii) For each assignment /i such that — v and such 
that ^ satisfies Q w.r.t. D (we have shown that 
there exists at least one such assignment /i), there 
exists at least one assignment fj,' that (a) satisfies 
Q' w.r.t. D, and that (b) coincides with ji on the 
set of body variables of Q. (See (ii) under case (1) 
of the proof.) Observe that ix'\x — v hy definition 
of 

(iii) On our counterexample database Z?, there exists 
at least one /i with = v and such that is a 
satisfying assignment w.r.t. Q and Z?, such that jjL 
corresponds to at least two distinct satisfying as- 
signments ^'i and ^2 w.r.t. Q' and Z), where each 
of and coincides with fx on all the body vari- 
ables of Q. Indeed, we recall that D is the canon- 
ical database of ((5°'''''^)e,s- If the distinct /z'^ and 

as above did not exist, then chase of Q'^'^'^ us- 
ing E under set semantics would lead to the "elim- 
ination of the distinction between" the groups of 
subgoals i^{h(X), Z) and il)(h{X\^{Z)) of Q'^-''-^ 
see Equation and Definition 14. 3[ in the terminal 
chase result of Q'"-'*^^ using E. But if ilj{h{X),Z) 
and ^{h{X),6{Zj) collapse into the same group 
in {Q'^''^'^)s,S: then a is a n assignment-fixing tgd 
w.r.t. Q by Definition l4.3i which is a contradiction 
with our assumption. 

We conclude that in Case (2), the multiplicity of the 
tuple iy'{X) is strictly greater in Q'{D, BS) than in Q{D, BS) 
on our counterexample database Z). Thus, Q'{D,BS) ^ 
Q{D,BS). Q.E.D. □ 

5. UNIQUE RESULT OF SOUND CHASE 

In this section we show that the result of sound chase 
of CQ queries using arbitrary finite sets of embedded 
dependencies is unique under each of bag and bag-set 
semantics for query evaluation. Further, we provide an 
algorithm for constructing, for a given CQ query Q and 
an arbitrary finite set of embedded depedencies E, the 
maximal subset E^"^(Q,E) of E such that D^*?") \= 
T,]^°'^{Q, E), where Qn is the result of sound chase of Q 
under bag semantics. We also outline a version of the 
algorithm that works for the case of bag-set semantics. 



5.1 Why Not Key-Based Tgds? 

We begin the discussion by examining the question 
of why the defi nition of assignment-fixing chase steps 
(Definition 14. 4p cannot be simphfied. The intuition 
behind the notion of assignment-fixing chase steps is 
that of ensuring that in each assignment-fixing chase 
step Q Q', using some tgd ct G S, each tuple 

in the bag Q{D,B) would have the same multiphcity 
in the bag Q'{D,B), for each database D \= Y], in 
presence of the requisite set-enforcing constraints (of 
Appendix [C]) . The intuition is the same for bag-set- 
semantics. It appears that a simpler notion, that of 
key-based tgds, would suffice. In the definition that fol- 
lows, we use the notation of Definition 14.41 

Definition 5.1. (Key-based tgdj Let a : (f){X,Y) 
3Z '4'{Y , Z) be a tgd on database schema T) . Then a is 
a key-based tgd if, for each atom piY^ , Z'^) in ip, is 
a superkey of relation P inV and, in addition, P is set 
valued on all instances ofD. □ 

The notion of key-based tgds is eq uiva lent to that of 
UWDs of tSj. Note that by Definition l4.41 all chase steps 
using key-based tgds are assignment fixing. However, 
the class of assignment-fixing tgds (w.r.t. the given CQ 
query and set of dependencies) includes not just key- 
based tgds, as illustrated in Example 14.81 In addition, 
unli ke a ssignment-fixing chase steps specified in Defini- 
tion l4.41 a key-based tgd is defined independently of the 
queries being chased. Deutsch [9J showed that the re- 
sult of sound chase of CQ queries under bag semantics 
is unique up to isomorphism, provided that all tgds in 
the given set of dependencies are key based. 

It turns out that the "key-basedness" constraints of 
Definition 15.11 on tgds are not necessary for soundness 
of chase under either of bag and bag-set semantics. In- 
deed, consider a modification of Example 14.31 

EXAMPLE 5.1. In the setting of Example we 
replace the query Q by a query Q'{X) : — p{X, Y ), r{A, X), 
and keep the set E' of dependencies of Example \4-.3\ We 
can show that tgd G E' is assignment fixing w.r.t. Q' . 
Recall that (T4 is not an a ssignment-fixing tgd w.r.t. the 
query Q of Example \4-3\ □ 

5.2 Uniqueness of Result of Sound Chase 

We now show that the result of sound chase of CQ 
queries using arbitrary sets of embedded dependencietQ 
is unique under bag and bag-set semantics, up to equiv- 
alence in the absence of dependencies (except for the 
set-enforcing dependencies under bag semantics). (Re- 
call that throughout the paper we assume that all given 
sets of embedded dependencies are finite and regular- 
ized.) We give here a formulation of our result only 
for the case of bag semantics. The version of Theo- 
rem [O] for the case of bag-set semantics (formulated in 
Appendix KJ\i is straightforward. 

Theorem 5.1. Given a CQ query Q and set E of 
embedded dependencies on schema T>, such that there 
exists a set-chase result ((5)e,s for Q and E. Then 
there exists a result {Q)-s,b of sound chase for Q and 
E under bag semantics, unique up to isomorphism after 

®Cf. the result of [9] on unique ness of sound bag chase for 
key-based tgds only; see Section [Ol for the discussion. 



dropping duplicate subgoals that correspond to set-valued 
relations in That is, for two sound-chase results 

iQ)^^^^ and {Q^gg for Q and E, iQ)^^^^ =b {Q^Sb 
in the absence of all dependencies other than the set- 
enforcing dependencies on stored relations. □ 

By Theorem 231 sound bag chase adds or drops only 
those subgoals whose predicates correspond to relations 
required to be sets. Thus, it is natural to use t he co n- 
ditions of Theorem 14.21 rather than of Theorem 12.11 in 
characterizing bag e quiv alence of terminal chase results. 

To prove Theorem l5.1l we make the following straight- 
forward observation. 

Proposition 5.1. Given CQ query Q and embedded 
dependencies E such that there exists a set- chase result 
(Q)s;,s- Then sound chase of Q using E terminates in 
finite time under each of bag and bag-set semantics. □ 

This result is immediate from T heor ems 14.11 and 14.31 
The rest of the proof of Theorem lS.ll is an adaptation, 
to sound chase steps, of the proof of the fact (see [10 J 
that all set-chase results (when defined) for a given CQ 
query are equivalent in the absence of dependencies. 
Please see Appendix iGl for the details. 

We now establish the complexity of sound bag and 
bag-set chase under weakly acyclic dependencies [T4] . 
Intuitively, weakly acyclic dependencies cannot gener- 
ate an infinite number of new variables, hence set-chase 
under such dependencies terminates in finite time; please 
see Appendix [Hi for the definition. All sets of dependen- 
cies in examples in this paper are weakly acyclic. 

Theorem 5.2. Given a CQ query Q and set E of 
weakly acyclic embedded dependencies on schema T>. 
Then sound chase of Q using E, under each of bag and 
bag-set semantics, terminates in time polynomial in the 
size of Q and exponential in the size ofT,. □ 

The upper bound is im mediate from Proposition 15. 11 
and from the results in [T1[TT1[T3| for set semantics. For 
the lower bound, we exhibit an infinite family of pairs 
(Q, E), where the size of each of {Q)-s.b and {Q)-s.bs is 
polynomial in the size of Q and exponential in the size 
of E. Please see Appendix iHl for the details. 

5.3 Satisfiable Dependencies Are Query Based 

We now provide a constructive characterization of the 
result of sound chase under bag and bag-set se mant ics. 
This characterization, formulated in Theorem 15.31 for 
bag semantics, settles the problem of which dependen- 
cies E' are satisfied by the canonical database of 
Qn- Here, Qn is the result of sound chase of CQ query 
Q using embedded dependencies E. (We assume that 
set chase of Q using E terminates in finite time.) 

Given a CQ query Q and a set of embedded depen- 
dencies E, consider the canonical database of the 
result Qn — {Q)t.,b of sound chase of Q using E under 
bag semantics. Clearly, at least some sets S' such that 
]j{Qn) 1^ Y,' do not coincide with the original E. (We re- 
fer here to the discussion in t he beginning of Section |4l) 
For instance, in Example 14. II the canonical database for 
query Q^ does not satisfy dependency (74. Observe that 
Q3 is the (unique, by Theorem l5.ip result of sound chase 
of using E under bag semantics. 

^See discussion of Theorem 14.21 in Section 14.21 



At the same time, for each pair {Q, S) there exists 
a unique maximal-size set E^"^((5,S) C E, such that 
ijW") ^I]™=^(Q,S). (Appendix H] has proof of Theo- 
rem l5.3l and the analogous result for bag-set semantics.) 

Theorem 5.3. (Unique i;|'''^(Q,E) C S) Given a 
CQ query Q and set E of embedded dependencies, such 
that there exists a set-chase result (Q)e,s for Q and E. 
Let Qn be the result of sound chase for Q and E under 
bag semantics, with canonical database dW"). Then 
there exists a unique subset E^°^((5, E) o/E, such that: 

• dC?") \= E™°^(g,E), and 

• for each proper superset E' of E^"^((5,E) such 
that E' C E, D^Q") 1= E' does not hold. □ 

It turns out that the set E^°^((5, E) is the result of re- 
moving from E exactly those tgds a such that the chase 
step Qn Q', with some CQ outcome Q' , is not 

sound under bag semantics. This claim is immediate 
from the observation that for each dependency cr in E 
such that (7 is applicable to Q„, a is unsoundly applica- 
ble to Qn- See Appendix |T] for the details. We make the 
same observation about the unique set E^^^((3, E) C E 
such that E^g^((5,E) is the maximal set of dependen- 
cies satisfied by the canonical database of the result of 
sound chase of Q using E under bag-set semantics. 

Not surprisingly, each of E'^°^(Q, E) and E^g^(Q, E) 
is query dependent. Recall that in Example 14.11 the 
canonical database of the query = {Qa)t,,b does not 
satisfy dependency (74 in the set E given in the example. 
At the same time, it is easy to see that for query Q{X) : 
— p{X, y), u{X, Z), the canonical database of the query 
(Q)e.b does satisfy dependency (T4 in the same set E. 

We now establish a relationship between E^°^((5, E) 
and E^g^(Q, E) for a fixed pair (Q, E). This relation- 
ship is immediate from Theorems miitiEi andini 

Prop ositi on 5.2. For (Q, E) satisfying conditions of 
Theorem\EM ^T'^iQ^ ^) ^ S™g^(Q, E) C E. □ 

Query Q4 and dependencies E of Example 14.11 can 
be used to show that both subset relationships can be 
proper: E^'^^(Q,E) C E^g^(Q,E) C E. 

We now outline algorithm Max-Bag-E-Subset, which 
accepts as inputs a CQ query Q and a finite set E of 
embedded dependencies such that (Q)s,s exists. The 
algorith m con structs the set E^°^((5, E) as specified in 
Theorem [Ol The counterpart of Max-Bag-E-Subset 
for bag-set semantics can be found in Appendix |T1 

Algorithm 1: Max-Bag- S-Subset(Q, S) 

Input : CQ query Q, set E of embedded dependencies 

such that chase result (Q)s,s exists. 
Output: E^'^^'CQ.S) C S specified in Theorem [Q] 

1. (Q)t,,b ~ soundChase{B,Q,'E); 

2. Eg''^'(Q,E) := E; 

3. for each ainU do 

4. if soundChaseStep{a, B , (Q)e s) = false then 
[ L5.Sr"(0,S) := Er"(Q,S)-W; 

6. return E^''^(Q,E); 



The algorithm begins (line 1 of the pseudocode) by 
computing the result {Q)s.b of sound chase of Q using 
E under bag semantics {B). This result exists and is 



unique by Theorem 15.11 Then the algorithm removes 
from the set E all dependencies that are unsoundly ap- 
plicable to {Q)s,B, see lines 2-5 of the pseudocode. Pro- 
cedure soundChaseStep{cr, B,{Q)^^b) (line 4) returns 
true if and only if the bag -chase step using a on (Q)s,_b 
is sound by Theorem 14. 11 

We obtain the following result by construction of al- 
gorithm Max-Bag-S-Subset. 

Theorem 5.4. (Correctness and complexity of 
Max-Bag-E-Subset) Given a CQ query Q and set of 
embedded dependencies E, such that there exists a set- 
chase result ((5)s.s for Q and E. Then algorithm Max- 
Bag-E-Subset retu rns i n finite time the set E^°'^(Q, E) 
specified in Theorem \5.3l If dependencies E are weakly 
acyclic, then the runtime of the algorithm is polynomial 
in the size of Q and exponential in the size ofYi. □ 

6. E-EQUIVALENCE TESTS FOR CQ AND 
CQ-AGGREGATE QUERIES 

We begin this section by providing equivalence tests 
for CQ queries in presence of embedded dependencies 
under bag and bag-set semantics, see Section lOl These 
results allow us to develop: (1) Equivalence tests for CQ 
queries with grouping and aggregati on in presence of em- 
bedded dependencies, see Section 16.21 and (2) Sound 
and complete (whenever sei-chase on the inputs ter- 
minates) algorithms for solving instances of the CQ 
class of the Query-Reformulation Problem under each 
of bag and bag-set semantics, as well as for the CQ- 
aggregate class of the problem, see Section 16.31 (Recall 
that throughout the paper we assume that all given sets 
of embedded dependencies are finite and regularized.) 

6.1 Equivalence Tests for CQ Queries 

The ma in re sult s of this section for CQ queries, The- 
orems EH] and [621 a-re the analogs, for bag and bag -set 
semantics, of the dependency-free test of Theorem 12.21 
for equivalence of CQ queries under set semantics and 
under embedded dependencies. 

Theorem 6.1. Given CQ queries Q and Q' , and a 
set of embedded dependencies E such that there exist 
set-chase results {Q)^,s for Q and {Q')-s,s for Q' . Then 
Q =s,s Q' if and only if {Q)t,,b =b {Q')t.,b in the 
absence of all dependencies other than the set-enforcing 
dependencies on stored relations^ □ 

Theorem 6.2. Given CQ queries Q and Q' , and a 
set of embedded dependencies E such that there exist 
set-chase results (Q)s,5 for Q and {Q')-s2,s for Q' . Then 

Q =s,BS Q' if and only if{Q)^,BS =BS iQ'h,BS in the 
absence of dependencies. □ 

The proofs of Theorcms l6.1landl6.2l follow from Propo- 
sition [5lT] and from Theorem 15. II and its analog for bag- 
set semantics. See Appendix [j] fo r th e details. 

We now formulate Proposition 16.11 which is the dep- 
endency-ba sed v ersion of Proposition [Til The proof of 
Proposition 16.11 can be found in Appendix [Kl 

Proposition 6.1. For GQ queries Q and Q' and set 
of embedded dependencies E, such that there exists the 
set- chase result for each of Q and Q' using E. Then (1) 
{Q)t.,b =b {Q')t..b, in the absence of all dependencies 
other than the set-enforcing constraints on stored rela- 
tions, implies {Q)y.,bs =bs {Q')y.,bs, and (2) {Q)t.,bs 
=bs {Q')t.,bs implies {Q)t. ,s =s {Q'h,s- ^ 
*See Theorem 14.2! and discussion of Theorem 15. II 



Observe that queries {Q)s,b, {Q)t,,bs, and {Q)s,s 
may be distinct queries for the same query Q and set 
E. For an illustration, please see th e ch ase results Qi 
through Qs of query Q4 in Exam ple 14. II 

A corollary of Proposition 16.11 establishes a set-cont- 
ainment relationship between a CQ query and the re- 
sults of its sound chase under a given set of embedded 
dependencies. Please see Appendix IK] for a proof. 

Prop osit ion 6.2. For (Q, E) that satisfy conditions 
of Thm. [53 (g)s,s C5 (g)E,BS {Qh.B Q- □ 

Queries Q4, Qs = (Q4)s,s, Q2 = iQi)s,BS, and 
Qi = (Q4)e,s of Example 14.11 provide an illustration. 

6.2 Equivalence Tests for Aggregate Queries 

We now provide dependency-free tests for equivalence 
of CQ queries with grouping and aggregation under em- 
bedded dependencies. The results o f this subsection are 

immediate from Theorems 12. 2[ 12,31 and 16.21 

Theorem 6.3. Given compatible aggregate queries Q 

and Q' , and a set of embedded dependencies E such that 
there exist set-chase results {Q)^^s for the core Q of 
Q and (OOs.S foi~ core Q' of Q' . Then (1) For 
max or min queries Q and Q' , Q =s Q' if and only 
if {Q)s.s =S (O')e.s ''^ absence of dependencies. 
(2) For sum or count queries Q and Q' , Q =s Q' if 

and only if ((5)s,bs =bs {Q')y.,bs ^"^ the absence of 
dependencies. □ 

6.3 Sound and Complete Reformulation of 
CQ and CQ-Aggregate Queries 

Theorems 16.11 and 16.21 allow us to extend the algo- 
rithm C&B of !TT] to (a) reformulation of CQ queries in 
presence of embedded dependencies under bag or bag- 
set semantics, and to (b) reformulation of CQ queries 
with grouping and aggregation in presence of embedded 
dependencies. Our proposed algorithm Bag-C&B re- 
turns E-minimal reformulations Q' of CQ query Q such 
that Q' =s,B Q under the given embedded dependen- 
cies E. The only modifications to C&B that are required 
to obtain Bag-C&B are (i) to replace the set-chase 
procedure by the sound bag-chase procedure as defined 
in this paper, and (ii) to replace the dependency-free 
equivalence test of Theorem 12.21 by the test of Theo- 
rem IHUl The algorithm Bag-Set-C&B for the case of 
bag-set semantics is obtained in an analogous fashion. 

We have also developed algorithms that accept sets of 
embedded dependencies and CQ queries with grouping 
and aggregation: Max-Min-C&B accepts CQ queries 
with aggregate function max or min, and Sum-Count- 
C&B accepts CQ queries with aggregate function sum 
or count. Max-Min-C&B uses C&B to obtain all E- 
minimal reformulations Q' =s,5 Q of the core Q of the 
input query Q, and for each such query Q' returns a 
query Q" whose head is the head of Q and whose body is 
the body of Q'. Sum-Count-C&B works analogously, 
except that it uses Bag-Set-C&B to produce queries 

Q' =s BS Q- By Theorem 16.31 for each output Q" of 
Max-Min-C&B or of Sum-Count-C&B it holds that 
Q" =5] Q whenever set-chase of Q using E terminates. 

All our algorithms are sound and complete whenever 
se<-semantics chase of Q using E terminates. 

Theorem 6.4. Given CQ query Q and set E of em- 
bedded dependencies such that set chase of Q under E 
terminates in finite time. Then Bag-C&B returns all 
Tj-minimal reformulations Q' such that Q' =-s,b Q- ^ 



The analogs of Theorem l6.4l for (a) CQ queries under 
bag-set semantics, and for (b) aggregate CQ queries can 
be found in Appendix |K1 All the theorems follow from 
the soundness and completeness of C&B of JTj (see 
Appendix |^ and from the results of this paper. 

7. RELATED WORK 

Chandra and Merlin [5] developed the NP-complete 
containment test of two CQ queries under set seman- 
tics. This test has been used in optimization of CQ 
queries, as well as in developing algorithms for rewrit- 
ing queries (both equiv alen t ly and none quivalently) us- 
ing views. Please see [TTl \t7\ [211 [23] for discussions 
of the state of the art and of the numerous practical 
applications of query rewriting using views. 

The problem of developing tests for equivalence of CQ 
queries under bag and bag-set semantics was solved by 
Chaudhuri and Vardi in 4J . The results on containment 
tests for CQ queries under bag semantics have proved 
to be more elusive. Please see Jayram and colleagues 
|18| for original undecidability results for containment 
of CQ queries with inequalities under bag semantics. 
The authors point out that it is not known whether 
the problem of bag containment for CQ queries is even 
decidable. On the other hand, the problem of contain- 
ment of CQ queries under bag-set semantics reduces to 
the problem of containment of aggregate queries with 
aggregate function count(*). The latter problem is 
solvable using the methods proposed in [7j. 

Studies of dependencies have been motivated by the 
goal of good database-schema design. See [J [TD] for 
overviews and references on dependencies and chase. 
In 9 , Deutsch developed chase methods for bag-specific 
constraints (UWDs), and proved completeness of the 
view-based version of the Chase and Backchase algo- 
rithm (C&B, illj) for mixed semantics and for set and 
bag dependencies, in case where all given tuple-generat- 
ing dependencies are UWDs. In contrast, the algorithm 
in 13J is complete in presence of just functional de- 
pendencies. Algorithms that are complete in the ab- 
sence of dependencies are given i n I20| for set semantics, 
in 3 for bag semantics, and in [16j for bag-set seman- 
tics. Finally, Cohen in ^ presented an equivalence test 
for CQ queries in presence of inclusion dependencies|3 
for the cases of bag-set semantics and of the semantics 
where queries are evaluated on set-valued databases us- 
ing both bag-valued and set-valued intermediate results. 
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APPENDIX 

A. THE C&B ALGORITHM OF [10] 

In this section of the appendix we give an overview of 
the Chase and Backchase (C&B) algorithm by Deutsch 
and colleagues, see [Tl for the details. Under set se- 
mantics for query evaluation and given a CQ query Q, 
C&B outputs all equivalent E-minimal conjunctive re- 
formulations of Q in presence of the given embedded 
dependencies E, whenever chase of Q under E termi- 
nates in finite time. 

C&B proceeds in two phases. The first phase of C&B, 
its chase phase, does chase of Q using E under set se- 
mantics, to obtain terminal chase result (Q)s.s'- This 
output of the chase phase is called the universal plan U 
for Q. Note that by construction of U , Q =s,s U . 

The second phase of C&B, its backchase phase, pro- 
ceeds as follows: 

1. Iterate over all queries U' whose head is head{U) 
and whose body is not empty and is body{U) with 
zero or more atoms dropped. 

2. Chase each U' using E, to obtain terminal chase 
result ([/')s,s- 

3. C&B outputs each U' such that for the terminal 
result {U')y,,s of chasing the candidate reformula- 
tion U' under E (under set semantics), it holds that 
{U')t.,s =S U, that is, each U' for which by Theo- 
rem [ITl it holds that U' =s,s Q- 

Theorem A.l. (C&B is sound and complete) 

For an arbitrary instance of the Query- Reformulation 



Problem with a CQ query Q, set semantics for query 
evaluation, and a set of embedded dependencies E such 
that chasing Q under E terminates in finite time, ChB 
outputs all H-minimal conjunctive reformulations Q' of 
Q such that Q' =-s,s Q- ^ 

The proof of Theorem lA.il is by construction of C&B. 

B. KEYS OF RELATIONS 

This section of the appendix provides basic definitions 
for the standard notion of a key of a relation [TS] . 

B.l Attributes and Relations 

Let U he a, countably infinite set of attributes. The 
universe J7 is a finite subset of U. A relation schema R 
of arity fc is a subset of U of cardinality k. A database 
schema (or, simply, schema) V over [/ is a finite set of 
relation schemas . . . , Rt\ with union U , of arities 
ki, . . . ,kt, respectively. 

Each attribute A has an associated set of values 
A(^), called ^'s domain. The domain is the set of 
values A = ^(^)- Let I? be a schema over U, R G "D 
a relation schema and X a subset of U. An X-tuple t 
is a mapping from X into A, such that each attribute 
A € X \s mapped to an element of A(>1). A (generally 
bag- valued) relation r over i? is a finite collection of R- 
tuples. A database ( instance ) D oiT> is a. set of relations, 
with one relation for each relation schema of T). 

B.2 Functional Dependencies and Keys 

Consider a database schema 2? with n-ary relation 
symbol P such that n > 1. A functional dependency 
(fd) on relation P in 2? is an egd of the form p{X, Y, Z) A 
p{X ,Y' , Z') Y = Y', such that predicate p corre- 
sponds to relation P. Here, Y and Y' must be in the 
same position in the respective atoms, meaning the fol- 
lowing. Let Y be the ith argument of atom p{X, Y, Z), 
for some 1 < i < n. Then Y' is the ith argument of 
atom p{X ,Y' , Z'). Similarly, we require each element 
of the vector X to be in_ the same position in each of 
p{X,Y,Z) and p{X,Y',Z'). 

Definition B.l. (Implied functional dependency^ 

Let a be an fd on relation R, and let Y, be a set of fds 
on R. Then tr is a functional dependency implied by E 
if a holds on all instances of relation R that satisfy E. 
□ 

Standard textbooks (see, e.g., TS]) describe algorithms 
for solving the problem of finding all fds implied by a 
given set of dependencies on the schema of a relation. 

Let K = {Ail, ■ ■ ■ 7 Aip} be a nonempty proper subset 
of the set of attributes of n-ary relation R{Ai, . . . , An), 
with n > 1. That is, 1 < p < n and Aij G {Ai, . . . , An} 
for each j G {1, . . . ,p}. In the definitions that follow, we 
will use the following notation: Let CT(K|yli), for some 
i G {l,...,n} such that Ai ^ K, denote an fd that 
equates the values of attribute Ai of R whenever the 
two r-atoms in the left-hand side of cr(K|Ai) agree on 
the values of all and only attributes in K. For example, 
if the schema of R is R{A, B, C, D) and K = {A, C}, 
then a{A, C\B) is defined as 

a{A, C\B) : r{A, Bi,C, Di) A r{A, B2,C, D2) Bi = B2. 



Definition B.2. (^Superkey of relation^ K is a 

superkey of relation R if for each attribute A in the set 
{Ai, . . . , An} — Vi, it holds that fd a{K.\A) is implied by 
the set E of fds on R. □ 

The set of all attributes of R is also a superkey of R. 

Definition B.3. fKey of relation^ K is a key of 

relation R if (1) K. is a superkey of R, and (2) for each 
nonempty proper subset K' of K, K' is not a superkey 
ofR. □ 

C. TUPLE IDS FOR RELATIONS 

In this section of the appendix we present a solu- 
tion to the problem of ensuring, under bag semantics, 
that certain base relations are sets in all database in- 
stances. To this end, we provide here a formal frame- 
work for tuple IDs, which are unique tuple identifiers 
commonly used in implementations of real- life database- 
management systems [15 . Our approach to ensuring 
that some relations are always set valued is to use func- 
tional dependencies (Appendix |B| to force certain re- 
lations to be set valued, by restricting tuples with the 
same "contents" (that is, all values with the exception 
of the tuple ID) to have the same tuple ID. 

Assume bag semantics for query evaluation and con- 
sider relation symbol Ri in database schema V. (Sec- 
tion IB. II has the relevant definitions.) We follow the 
approach taken in implementations of real-life database- 
management systems [15. by incrementing the arity of 
Ri. As a result, the arity of each relation Ri becomes 
ki + 1 instead of the original ki as defined in Section l2h'^l 

Let 2?' be the schema resulting from such arity mod- 
ification in V for each relation Ri. By D' we denote 
instances of V . In the schema of Ri in V , let the last 
attribute of Ri be the attribute for the tuple ID. The 
values of all tuple IDs are required to be distinct in all 
instances of V , which is formally specified as follows. 

Definition C.l. (Tuple ID.) For a relation symbol 
Ri of arity ki + 1 in database schema D' , let queries Q^i^ 
and Qyi^ig be as follows: 

Q^d(-^k,+i) ■ - Ri{Xi, . . . ,Xk,,Xki+i)- 

Qvalsi^^^ ■ ■ ■ T^ki) ■ - Ri{Xi, . . . ,Xki,Xki + l). 

Then the (ki + l)st attribute of Ri in D' is the tuple 
ID for Ri if in all instances D' of V , the following 
relationship holds between the relations Q^^'^{D' , B) and 
QtisiD'.B): 

\coreSet{Q^,{D',B))\ = \Qtis^D' , B)\. 

Here, coreSet^S) denotes the core-set of bag B, and 
B| denotes cardinality of^. □ 

We now study the relationship between instances D' 
of V and instances D of V. Suppose that for relation Ri 
of arity ki + 1 in V , the last attribute of Ri is the tuple 
ID of Ri. By definition of tuple IDs, for each instance 
D of V, relation Ri in D can be obtained from some 

^"We emulate the standard implementation practice that tu- 
ple IDs be invisible to the users of the database system; that 
is, in our approach the user assumes that the arity of each 
relation Ri is still ki. 



instance D' of V , by evaluating query Q^^i^ under bag 
semantics on Ri in D': 

Qvalsi^T^^ ■ ■ ■ ^Xk,) : - Ri{Xi, . . . ,Xk,,Xk^ + i). 

Now suppose that in (the original) schema V, a rela- 
tion with symbol Ri and arity ki is required to be set 
valued in all instances of V. We enforce this require- 
ment by the functional dependency 

'^Ud- Ri{Xl, . ■ . , Xki, Xki + l)A 

Ri{Xi, . . . , Xfc. , Yfe.+i) — > Xki+i ~ Yk^+i 

on Ri in schema 2?'. This functional dependency en- 
forces the same tuple ID for each pair of tuples that 
agree on the values of all other attributes of Ri . In con- 
junction with Dcfinition lC.il which ensures uniqueness 
of each tuple ID within each instance of V , enforces 

that the answer to query Q^^i^ (i.e., Ri in schema V) 
be set valued when computecT under bag semantics. 

In the context of Example 14.11 in presence of tuple 
IDs we could formally define dependency erg as an egd: 

ae ■■ t{X, Y, Z, U) A t{X, Y,Z,W) ^ W. 

Here, the fourth attribute of relation T is the tuple-ID 
attribute. 

D. PROOF OF THEOREM 4.2 

Th is se ction of the appendix provides a proof o f Th c- 
orem 14.21 We first supply the details of Example 14.91 

EXA MPLE D.l. To show that query of Exam- 
ple \4.1\ is not bag equivalent to query Qq, 

QsiX) ■.-p{X,Y),t{X,Y,W),s{X,Z). 

Q5{X) : - p{X, Y), t{X, Y, W),s{X, Z), s{X, Z). 

we construct a bag-valued database D, with the following 
relations: P = {{(1,2)}}, i? = 0, S" = {{(1, 3), (1, 3)}}, 
T = {{(1,2,5)}}, and t/ = 0. On this database D, 
the answer to is Q'i{D,B) = {{(1),(1)}}, whereas 
Q5{D,B) = {{(1),(1),(1),(1)}}, by rules of bag seman- 
tics. From the fact that Q-^yD^B) and Qc,[D,B) are 
not the same bags, we conclude that bag equivalence 

Q3 =B Qb does not hold. 

At the same time, by Theorem \4-.2\ it holds that and 
Q5 are bag equivalent on all databases where relation S 
is required to be a set. □ 

We now prove Theorem 14.21 The If part of the proof 
is straightforward. For the Only-If part, we argue that 
the only way for Qi and Q2 to be bag equivalent under 
the set-enforcing constraints of database schema T> is 
for Qi and Q2 to sat isfy t he conditions of Lemma [D. II 
The proof of Lemma FD II completes the proof of Theo- 
rem 221 by showing by contrapositive that bag equiva- 
lence of Qi and Q2 under the set-enforcing constraints 
of database schema T> has to entail isomorphism of the 
que ries Q\ and Q2 defined in the statement of Theo- 
rem O 

Proof. (Theorem li?^ 

If. Let database schema T> have a relation symbol P, 
such that the relation for P is set valued in all (bag- 
valued) instances D over "D. (Appendix O provides 
an approach to enforcing this set-valuedness constraint 
using functional dependencies that involve tuple IDs.) 



Consider an arbitrary CQ query Qi that has a subgoal 
with predicate p corresponding to relation P; w.l.o.g. 
let the subgoal be p{W). Let Q2 be a CQ query ob- 
tained by adding to the body of Qi a duplicate oip{W). 

We argue that for Qi and Q2 as described above, it 
holds that Qi =b Q2 under the set-enforcing depen- 
dencies of the schema V. (The claim of the If direc- 
tion of the theorem is immediate from this observation.) 
Indeed, consider an arbitrary instance D of database 
schema V, such that D satisfies the set-enforcing de- 
pendencies of the schema T>. From the definition of bag 
semantics for query evaluation it follows that each as- 
signment satisfying the body of Qi w.r.t. D is also a 
satisfying assignment for the body of Q2 w.r.t. D, and 
vice versa. Further, each such satisfying assignment 7 
maps p(W), in the body of Qi, into a single tuple t in 
relation P in Z?, and similarly 7 maps both copies of 
p{W), in the body of Q2, into the same single tuple t, 
due to relation P being set valued in the database D. 
It follows that each such satisfying assignment 7 con- 
tributes to each of Qi{D,B) and Q2{D,B) the same 
number of tuples under bag semantics for query evalu- 
ation. The claim of the If direction of Theorem 14.21 is 
immediate from the above observation. 

Only-If. The proof is by contrapositive. For two CQ 
queries Qi and Q2, let Qi =b Q2 hold in the absence 
of all dependencies other than the set-enforcing depen- 
dencies of the schema V. Consider queries Qi and Q'2 
defined in the statement of Theorem 14.21 We assume 
that Q'l and Q'2 are not isomorphic, and obtain from 
this assumption that Qi and Q2 are not bag equivalent 
on at least one database that satisfies the set-enforcing 
dependencies of schema in contradiction with what 
we are given. 

W.l.o.g., let s be a subgoal of query Q'^ such that 
either Q'2 has no subgoals with the predicate of s, or 
Q'2 has fewer (but still a positive number of) subgoals 
with the predicate of s than Q'^ does. Consider first 
the case where Q'2 has no subgoals with the predicate 
of s; it follows from the construction of queries Q'^ and 
Q'2 that Q2 does not have subgoals with the predicate 
of s either, whereas Qi has at least one occurrence of 
subgoal with the predicate of s. Observe that in this 
case, set equivalence between Qi and Q2 does not hold 
by th e res ults of [2]. From the result of [4J (sec Propo- 
sition 12.11 in this current paper) that bag equivalence 
implies set equivalence, it follows immediately that bag 
equivalence of Qi and Q2 cannot hold either, in pres- 
ence of the set-enforcing dependencies in the schema 
T>. (This follows from the fact that Q2 %s Qi implies 
existence of a set-valued database on which Q2 under 
set semantics produces a tuple t, such that t is not in 
the sei-semantics answer to Qi on the same database.) 
Thus, we have arrived at a contradiction with our as- 
sumption that Qi =B Q2 on all databases satisfying the 
set-enforcing dependencies of the schema T>. 

We now consider the remaining case concerning the 
number in Q'2 of subgoals with the predicate of s, that 
is the case where Q'2 has fewer (but still a positive num- 
ber of) subgoals with the predicate of s than Q'^ does. 
Suppose first that there is no bag-set equi vale nce be- 
tween Qi and Q2- That is, by Theorem 12.11 we as- 
sume that the canonical representations of Q'l and of 
Q'2 (which are the same as the canonical representa- 



tions of Qi and of Q2, respectively) are not isomorphic. 
Then similarly to the p revi ous case considered in this 
proof, from Proposition 12.11 we obtain immediately the 
contradiction to Qi =b Q2 under the set-enforcing de- 
pendencies of schema T). (Similarly to the case above, 
Qi =BS Q2 would have to be violated on a sei-valued 
database, therefore the set-enforcing dependencies of 
the schema T) would be satisfied in that Qi =b Q2 
would be violated on the same database.) 

Thus, for the rest of this proof we assume that (1) 
Qi =BS Q2, and (2) Q'^ =bs Q'2 (from Qi =bs Q2 and 
by construction of Q'l and Q2). That is, for both pairs 
of queries the canonical representations are isomorphic. 
Under these restrictions, the only way Q'l and Q'2 can 
be nonisomorphic is the case where Q'l (w.l.o.g.) has 
more subgoals (than Q'2) whose predicate corresponds 
to a relation, say R, that is not required to be a set in all 
instances of schema V. (Indeed, if Qi and Q2 have this 
number-of-subgoals discrepancy for a predicate whose 
relation is required to be a set in all instances of 2?, then 
Qi and Q'2 must have the same number of such subgoals 
by Qi =BS Q2 and by construction of Q'l and Q'2.) 
Note that in this case, relation symbol R must belong 
to 2? — {Pi, . . . , Pfc} ("— " is set difference), and thus the 
subset relationship {Pi, . . . , Pfc} C 2? is proper in this 
case, that is {Pi, . . . , P^} C P. Recall that {Pi, ... , Pfc} 
is the maximal subset of V such that all symbols in 
{Pi, . . . , Pfc} correspond to relations required to be set 
valued in all instances of V. 

W e finish the proof of Theorem l4.2l bv proving Lemma 
ID. 11 which constructs a database D satisfying the set- 
enforcing dependencies of schema T). By construction, 
database Z? is a counterexample to Qi =b Q2 (on 
databases satisfying the set-enforcing dependencies of 
schema V), whenever Q'^ has more subgoals (than Q'2) 
whose predicate corresponds to a relation that is not 
required to be a set in all instances of schema T>. □ 

Lemma D.l. Let V, {Pi, . . . , Pfc} c V, Qi, Q2, Qi, 

and Q'2 he as specified in Theorem \4.^ and let Qi =bs 
Q2. Let R he a relation symbol in the setT>—{Pi, . . . , Pfc}; 
that is, relation R is not required to be a set in all in- 
stances of v. Suppose that Qi has strictly more subgoals 
whose predicate corresponds to R than Q'2 does. Then 
there exists an instance DofD such that all of relations 
Pi, . . . , Pfc are set valued in D, and such that Qi{D, B) 
is not the same hag as Q2{D, B). □ 

By the above characterization, database D is a coun- 
terexample to queries Qi and Q2 being bag equivalent 
on all instances of 2? that satisfy the set-enforcing re- 
strictions of schema 2?. 

The intuition for the proof of Lemma lD.ll is as follows. 
Let query Qi have ni > 1 subgoals whose predicate cor- 
responds to relation P, such that R is not required to 
be set valued in instances of schema 2?. (Part of the 
proof is to show that by the properties of this relation 
symbol R and by construction of Qi from Qi, it holds 
that Qi and Qi have exactly the same number of sub- 
goals whose predicate corresponds to R. We make the 
same observation about Q2 and Q2-) Further, let q uery 
Q2 have a positive number (by proof of Theorem 14. 2p 
n2 < ni of subgoals whose predicate corresponds to R. 
We build a database D on which Qi produces at least 
m("i^ copies of some (distinct) tuple t*, with the posi- 
tive integer value of m to be determined. We then "let" 



Q2 have as many satisfying assignments for the body of 
Q2 w.r.t. this database D as possible. That is, we as- 
sume the best case for Q2 of producing as many tuples 
on database D as possible. We then show that if the 
value of m is chosen in a certain way, then the number 
^("i) of copies of tuple t* in the bag Qi{D, B) is greater 
than the maximal (i.e., best-case) number N of all tu- 
ples (counting all duplicate tuples as separate tuples) 
that can be contributed by Q2 to the bag Q2{D,B). 
The reason that we can make such a choice of the value 
of m is that this maximal number N grows asymptoti- 
cally as m^"^\ with < n2 < ^i, whereas the number 
of copies of tuple t* in the bag Qi{D, B) is m("i). 



bag Q2{D,B) cannot exceed 



Proof. (Lemma ID. ip Let ni be the number of sub- 
goals in Q'l whose predicate corresponds to R, and let 
712 be the number of subgoals in Q2 whose predicate 
corresponds to R; n2 > by Qi =bs Q2- By our as- 
sumption, ni > n2 + 1. By construction of Q'l, Qi 
has the same number ni of subgoals whose predicate 
corresponds to R as Q'l does; we make the same obser- 
vation about the relationship between the number 77.2 of 
subgoals in Q2 whose predicate corresponds to R and 
the (same) number 772 of subgoals of the same type in 
Q2- (See proof of Theorem 14.21 for the details of the 
argument.) 

Let D' be the (set- valued by definition, see Section in]) 
canonical database for the canonic al re presentation of 
Q[. (From the proof of Theorem 14.21 we have that 
Qi, Q'l, Q2, and Q'2 all have the same canonical repre- 
sentation.) We construct from D' our counterexample 
database D as follows. 

1. For each relation symbol S va V — {i?}, the rela- 
tion 5 in D is the same as the relation 5* in D' . By 
construction of D', all the relations in {Pi, . . . , P^} are 
set valued in database D. Thus, database D satisfies 
the set-enforcing restrictions of the schema T). 

2. We build relation P in D by "putting together" 
7)7 > copies of relation R in D' , with the value of m 
to be determined shortly. That is, for each tuple t such 
that t is in the set-valued relation P in D' , relation P in 
D has 777 copies of tuple t. further, R in D has no other 
tuples. 

By definition of bag semantics for query evaluation, 
see Section [2.2) the bag Qi{D,B) has at least 777'"^' 
copies of some individual tuple. Indeed, consider the 
assignment mapping 7 from Qi to D such that 7 was 
used to generate the canonical database D' of the canon- 
ical representation of Qi- (See Section [2Tl for the de- 
scription of the process of construction of a canonical 
database for a CQ query.) Observe that 7 is a satisfying 
assignment for the body of Qi w.r.t. database D. The 
assignment 7 maps each of the P-subgoals of Qi to at 
least m tuples of P, by construction of relation P in D, 
and 7 maps each non-P subgoal (if any) of Qi to exactly 
one tuple. Thus, for the tuple t* = -/{X) £ Qi{D,B), 
where Qi{X) is the head of the query Qi, the multiplic- 
ity of t* in Qi{D,B) is at least tr^"!). (The "at least" 
part comes from the possibility that extra copies of the 
tuple t* could be contributed to the bag Qi{D,B) by 
one or more satisfying assignments 7' for the body of 
Qi w.r.t. database D, such that each such 7' is not 
identical to the assignment 7.) 

At the same time, we show that the total size of the 



(4) 

tuples, in case the total number 773 of subgoals of query 
Q2 is greater than 712 ; 774 is the number of subgoals of 
Qi whose (subgoals') predicate does not correspond to 
relation symbol P. (By Qi =bs Q2 we have that 774 > 
whenever 773 > 772.) In this case, we set the value m* of 

777 to 



1+77^"^^ X77i"^-"^^ 



It follows that 
(777 

That is (recall that < 712 < ni). 
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We conclude that on the database D where the value 
of 771 is fixed at 771* , the number of copies of tuple t* in 
the bag Qi{D,B) exceeds the number of all tuples in 
the bag Q2{D,B). Therefore, the bag Qi{D,B) is not 
the same bag as Q2{D, B). 

(In case the total number 773 of subgoals of query Q2 
is equal to 7i2, we show that the bag Q2{D,B) cannot 
have more than 



(2n2 



,("2) 
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tuples. In this case, we set the value 777* of 777 to 



1-1-77 



(2n2) 
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It follows that at this value 777* of m, we have that 



That is (recall that < 772 < ni) 
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We conclude that on the database D where the value 
of 777 is fixed at 777*, the number of copies of tuple t* 
in the bag Qi{D,B) exceeds the number of all tuples 
in the bag Q2{D,B). Therefore, the bag Qi{D,B) is 
not the same bag as Q2{D, B). The proof of this case is 
straightforward from the proof, see below, of Equation[4] 
for the case where the total number 773 of subgoals of 
query Q2 is greater than 7i2.) 

We now explain why the bag Q2{D, B) cannot be of 
greater cardinality than the number of tuples specified 
in Equation m in the case where the total number 713 of 
subgoals of query Q2 is greater than 772 . The idea of the 
proof is to "let" Q2 have as many satisfying assignments 
for the body of Q2 w.r.t. database D as possible. That 
is, we assume the best case for Q2 of producing as many 
tuples on database D as possible. We take the following 
specific steps in building the upper bound: 

1. We assume the best case for Q2 of the number of 
satisfying assignments, w.r.t. database D, for the 
n^ — n2 subgoals of Q2 whose (subgoals') predicates 
do not correspond to P. The maximal number of 
such assignments cannot exceed 

77i"^-"^\ (12) 

That is, the best case for Q2 is to assume that all 
of the 773 — 7i2 subgoals of Q2 have the same predi- 
cate, say predicate s corresponding to the relation 



symbol S, where S may or may not be one of the 
relation symbols Pi, . . . ,Pk specified in the formu- 
lation of this Lemma. We also assume that the 
n4 > non-i? subgoals of Qi £^lso have the same 
predicate s. Database D has at most 714 tuples in 
relation S (by construction of canonical databases). 
We assume the best case for Q2 that each of the 
— ?^2 subgoals of Q2 can map independently into 
each of the (at most) 714 tuples, hence the formula 
of Equation [T2I 

2. For each of the above assignments, Q2 
may have at most 

n^"^^ (13) 

satisfying assignments, w.r.t. database D, for all 
the n2 subgoals of Q2 whose predicate corresponds 
to the relation symbol R. The computations are 
similar to those that we used in explaining Equa- 
tion [H 

3. For each of the n^"^'' satisfying assignments, w.r.t. 
database D, for all the n2 subgoals of Q2 whose 
predicate corresponds to the relation symbol R, Q2 
can produce on database D at most 

(ni X m)("2) (14) 

tuples. We obtain the formula of Equation [HI by 
assuming that the evaluation of Q2 admits a Carte- 
sian product of n2 copies of the relation R, where 
relation R has at most ni x m tuples on D. 

4. We combine Equations [121 [131 and [TH to obtain 
that the total number of satisfying assignments for 
the body of Q2 w.r.t. database D cannot exceed 



(satisfying assignments); and that, further, for each 
one of these assignments Q2 produces on database 
D at most 

(ni X to)("2) (16) 

tuples (where each duplicate is counted separately) 
in the bag Q2{D, B). (Recall that all relations ex- 
cept R are set valued in database D.) We conclude 
that the total number of tuples (including dupli- 
cates) that query Q2 produces on database D is at 
most 

(ni)2(«^) X 4"^""^) X to("=) (17) 

tuples. Equation 1171 gives us an upper bound on 
the size of the bag Q2{D, B). Q.E.D. 

□ 

Consider an illustration to the proof of Lemma ID. II 

EXAMPLE D.2. Let CQ queries Q7 and Qs be de- 
fined as 

Qr{X) : ^ p{X,Y),r{X),r{X). 
Qs{X) -.-^ p{X,Y),r{X). 

in the set ting of Example \4-l\ To illustrate the proof 
of Lemma \D 1[ we construct a counterexample database 
to the claim that Qi and Qs are bag equivalent on all 
databases that satis fy just the set-enforcing dependen- 
cies of Example \4-1\ We use the fact that query Q7 has 



two copies of subgoal r{X), whereas Qs has just one 
copy of that subgoal. (Recall that relation R is not re- 
quired to be a set o n all instances of the database schema 
T) of Example \4.1\ ) 

Queries Q7 and Qs, as well as the database schema 
T) of Example together with its set- enfo rcing con- 
straints, satisfy all the conditions of Lemma \D.1\ Ob- 
serve that query Q'-j ( defined in the statement of Theo- 
rem \4.^ is isomorphic to Qj, because relation R is not 
required to be a set. Similarly, query Qg is isomorphic 
to Qs ■ Further, the canonical representation of each of 
Q7, Qs, Qy, O'TT'd Qg is isomorphic to query Qs- 

Consider query Qg and its canonical database D' , 
with P = {{(1,2)}} and R = {{(1)}}. From D' , we 
construct a bag-valued database D, with relations P = 
{{(1, 2)}} (same as P in D' ) and with m > copies of tu- 
ple (1) in relation R. That is, R = {{(1), . . . , (1)}}, with 
cardinality m of bag R in D. Let relations S, T , U be 
empty sets in D. Then D sat isfies all the set-enforcing 
dependencies of Example 

Now using the notation of the proof of Lemma \D.1\ 
we have ni = 2. Here, ni is the number of subgoals of 
Q'-j - and thus also of Qj - whose predicate corresponds 
to R. At the same time, n2 = 1 < ni, where n2 is the 
number of subgoals of Qg - and thus also of Qs - whose 
predicate corresponds to R. Further, the total number 
of subgoals of Qs is ~ 2, and the number 71,4 of 
non-R subgoals of Qy is Uj^ — I. 

It is easy to verify that the bag Qt{D, B) has m^"^^ = 
copies of tuple (1). At the same time, by the argu- 
ment justifying Equation[^ in the proof of Lemma \D.l[ 
the total number of tuples (where each duplicate is counted 
separately) in the bag Qs{D,B) cannot exceed 

n^'"^^ X 4"'""'^ X m("^) = 22 X l^^-i) x = 4m 

tuples. It is easy to see that for any value m* of m 
such that m* > 4, the number of copies of tuple (1) in 
the bag Qt{D,B) is always going to be greater than the 
cardinality of the bag Qs{D, B). 

In fact, the upper bound of Equation^ is not tight for 
this example, as can be observed from the facts that 

• the total number of copies of tuple (1) in bag Qs(D, B) 
is m, and 

• the core-set of the bag Qs{D, B) has no tuples other 
than (1); therefore, the cardinality of the bag Qs{D, B) 
is m as well. _ 



E. PROOFS OF THE THEOREMS ON 
SOUND CHASE STEPS 

We pro vide her e re presentative parts of proofs for 
Theorems 14.11 and 14.31 The idea of the complete proofs 
is to show, for an arbitrary embedded dependency, one 
of the following two things: 

(1) Either using the dependency results in sound chase 
steps, under the appropriate semantics, for all CQ queries, 
in case the format of the dependency is described in the 
app licab le theorem (i.e., either Theo rem 14.11 o r Th eo- 
rem 14.31) . Please see Proposition IE. 1 1 in Section [E. 21 for 
an example of such a claim. 

(2) Or using the dependency results in unsound chase, 
in case the format of the dependency is not described in 
the theorem for the respe ctive query-evaluation seman- 
tics (i.e., either Theorem 14.11 or Theorem 14. 3p . Please 



see Propositions lE.2l and lE.3l in Scction [E72l for examples 

of such claims. 

All the remaining proofs for Theore ms 14 . 1 1 and 14 . 31 are 
analogous to the proofs of Propositions [E?T] through lE.3l 

E.l Bag Projection 

This subsection of the appendix defines bag projec- 
tion. W e use the defin ition in the proof of Proposi- 
tion [El] in Section EH 

Given positive integers m, k and ...,i(fc), such 
that for each j G {1, . . . , fc} it holds that 1 < < 
m. Then for an m-tuple t — (ai, . . . , am), we say that 
a fc-tuple t' = (aj^x-), . . . , ai(j,)) is a projection oft on 
attributes in positions . . . ,i{k), denoted 
t' = t[t{l),...,t{k)]. 

Further, for the m, k and i(l), . . . , i{k) as above and for 
an m-ary relation P, a bag of tuples _B is a bag projection 
of P on attributes in positions . . . , ?(fc), denoted 

B — T^i(^i'^ i(fc)(^)' each tuple t G P contributes to 

B a separate tuple t' — t[i{l), . . . ,i{k)], and if B has 

no other tuples. B can be interpreted as the answer 

Q{D, B) on database {P} to query 

Q{Xi(i),. ■ ■ ,Xif^f,)) : - p{Xi, . . . 

where the predicate p corresponds to relation P. 

E.2 The Proofs 

Proposition E.l. Given a CQ query Q and a set 
of embedded dependencies S. Under bag semantics for 
query evaluation, a chase step Q Q' using tgd cr G S 
is sound if Q Q' is (tgd) key-based, and for each 
subgoal s{pij) that the chase step adds to Q, relation Pij 
is set valued on all databases satisfying E. □ 

Proof. Let a be of the form 

a : y) ^ 3Z pi(Fi, Zi) A . . . A p„(y„, Z„), 

with n > 0. Here, the set of variables in each Y^, i G 
{1, . . . , n}, is the maximal subset, in the set of variables 
in Yi[J Zi, of the set of variables in Y. (We abuse the 
notation by treating Yi[J Zi as a set of variables and 
constants.) We show that the chase step Q Q' 
using a is sound whenever for all i G {1, . . . ,n} such 
that pi{Yi,Zi) corresponds to a subgoal in Q' that is 
not a subgoal of Q, it holds that (1) Yi is a superset 
of the key of relation symbol Pi in 2?, and (2) Pi is set 
valued in all databases with schema T). 

By our assumption that a is applicable to Q, (1) there 
exists a mapping /i from a (not necessarily proper) su- 
perset ^ of to a subset of subgoals of Q. By the same 
assumption, (2) there does not exist a mapping fi' such 
that fi' is an extension of fi and such that fJ-'{4') is also a 
subset of subgoals of Q. Here, ip is the right-hand side 
of the tgd a. 

Consider a mapping v from (j) to the body of Q, such 
that v agrees with /z on all the variables in ^ (note that 
all of Y are in ^), and such that i/ maps the subset of 
variables Z inip—^ (here, "ip — ^" is read as set difference 
between sets of conjuncts ^p and ^) into distinct fresh 
variables. By definition of chase step for tgds, i^iip) adds 
at least one subgoal to Q, which results in query Q'. 
Let one such irew subgoal S be the result of applying 
v to atom PiiYi, Zi) in the part of cr, for some i G 



Consider an arbitrary database D with schema V, 
such that D satisfies the dependencies S. To finalize our 
proof, it remains to show that on D, the following two 
relations are the same as bags: Q{D, B) and Q"{D, B), 
where Q" results from adding the subgoal S to the body 
of Q. Here, each of Q{D,B) and Q"{D,B) is to be 
computed under bag semantics for query evaluation. 

Let bQ{D,B) be the relation, on D, for the body 
of Q{D,B), and let bQ"{D,B) be the relation, on D, 
for the body of Q"{D.B). Note that if bQ{D,B) and 
bQ"{D,B) are the same bags modulo the columns of 
bQ{D,B), then Q{D,B) and Q"{D,B) are the same 
bags as well. (Recall that the heads of Q and Q" are the 
same by definition of Q"-) When we say ''bQ{D, B) and 
bQ"{D,B) are the same bags modulo the columns of 
bQ{D, By\ the meaning is as follows: If we do bag pro- 
jection on bQ"{D, B) on just the columns of bQ{D, B), 
then w e wil l obtain precisely bQ{D, B). (Please see Ap- 
pendix |ET] for the definition of bag projection.) 

We now show that bQ{D,B) and bQ"{D,B) are the 
same bags modulo the columns of bQ{D,B), which fi- 
nalizes our proof. The case where bQ{D,B) is empty is 
trivial, thus we assume for the remainder of the proof 
that bQ{D, B) is not an empty bag. Consider an as- 
signment mapping A that was used to obtain a tuple t 
in bag bQ{D, B). By definition of (tgd) key-based chase 
step for cr, there is exactly one way (up to duplicates of 
stored tuples) to extend A, to obtain a (distinct) tuple 
t' G Pi, such that t' "matches" t according to the join 
conditions between the body of Q and the new subgoal 
S in Q"E3 Further, from the fact that the relation Pi 
is a set on D, we obtain that t' is a unique tuple (i.e., it 
has no duplicates in Pi) that "matches" t in the above 
sense. As a result, each single tuple in bQ{D, B) corre- 
sponds, for the purposes of computing bQ"{D, B) from 
bQ{D, B), to exactly one tuple in Pi. 

Observe that the above procedure for computing 
bQ"{D,B) from bQ{D,B) corresponds to a valid plan 
for computing hQ"{D, B) from only the stored relations 
in D. (This plan is a left-linear plan, such that Pi 
is the right input of the top join-operator node in the 
tree . F or the basics on query-evaluation plans, please 
see [H].) We conclude that bQ{D,B) and bQ"{D,B) 
are the same bags modulo the columns of bQ{D, B). □ 

Proposition E.2. Given a GQ query Q and a set of 
embedded dependencies S. Consider a key-based chase 
step Q =^'^ Q' using tgd cr G S, 

cr : (t){X,Y) 3Z i}j{Y,Z). 

Suppose that at least one relation Pi used in ijj is not set 
valued. Further, suppose that in the chase step Q 
Q' using a, Q' is obtained by adding to the body of Q 
a new Pi-subgoal s{Pi) (possibly alongside other sub- 
goals)^^ Then under bag semantics for query evalua- 

^^That is, the extension of A is a satisfying assignment for the 
body of Q" w.r.t database D. In this and other proofs, we 
can use "procedural" evaluation of queries under each of bag 
and bag-set semantics. The correctness of this usage stems 
from the fact that our definitions for que ry ev aluation under 
bag and bag-set semantics, see Section 12.21 are consistent 
with the operational semantics of evaluating CQ queries in 
the SQL standard, as shown in 4 . 

^^I.e., in the chase step Q =>a Q' , applying a to Q may 
generate other new subgoals besides the Pi-subgoal. 



tion, the chase step Q =^'^ Q' using a is not sound. 
□ 

Proof. Let a be of the form 
a : y) ^ 3Z pi (Fi , Zi) A . . . A Pn{%, ^„), 

with n > 0. Here, for each j G {1, ■ • • ,n}, Yj is the 
maximal subse t of Y in the set Yj (J Zj , please see proof 
of Proposition IE. II for the notation. In addition, the 
relation Pi for Pi{Yi, Zi) is not a set-valued relation for 
at least one i e {1, . . . , n}. Given thi s ass umption on Pi, 
the proof of the claim of Proposition lE.2l is by providing 
a bag- valued database D, such that D satisfies E and 
such that Q{D^ B) and Q'{D, B) are not the same bags. 

We build the database D as follows. Let D' be the 
canonical database for query Q' . We obtain D by adding 
to D' a single duplicate of the tuple for the subgoal 
s{Pi) of Q' . We now follow the reasoning in the proof 
of Proposition IE. II to observe that the bag Q'{D. B) 
has at least one more tuple than the bag Q(D,B), due 
to the fact that the two identical tuples of relation Pi 
add to Q'{D,B) an extra copy of at least one tuple in 
Q{D,B). This observation concludes the proof. □ 

We now provide an illustration that shows the main 
points of the proof of Proposition [KJ] 

EXAMPLE E.l. Consider a set Y. ^ {0-1,(72} of 
embedded dependencies, where 

(7i : p{X, Y) A p{X, Z) ^Y ^ Z. 
<j2:r{X,Y)^piX,Y). 

Observe that chase steps using 02 o-tc (tgd) key-based 
in presence of the egd a\, and that E does not include 
dependencies that would restrict the relation P, in the 
right-hand side of 02, to be set valued. 
Consider a CQ query Q defined as 

Q{A) :-r(A,S). 

Applying (T2 to the query Q, in chase step Q ^'^^ Q' , 
results in query Q' defined as 

Q'{A) ■.-riA,B),p{A,B). 

We now illustrate the cons truct ion of the database 
D in the proof of Proposition \E.2[ First, the canoni- 
cal database D' of Q' has relations R = {{(a, 6)}} and 
P — {{(a, 6)}}. D is constructed from D' by adding to 
relation P a duplicate of the tuple {a,b), that is D has 
relations R = {{(a, 6)}} and P = {{(a, 6), (a, 5)}}. Note 
that database D is bag valued and satisfies all the de- 
pendencies in T,. 

Now by the bag semantics for query evaluation, Q{D, B) 
= {{(a)}}, while Q'{D,B) = {{(a), (a)}}. Thus, database 
D is a counterexample to Q =s,b Q' , which proves that 
the chase step Q =^^^ Q' using CT2 is not sound. □ 

Proposition E.3. Civen a CQ query Q and a set of 
embedded dependencies T,. Let a G T, be a tgd, 

a : (j){X, Y) ^ 3Z Zi) A . . . Ap„(r„, Z„), 

with n > Ol3 Consider a chase step Q Q' us- 

ing cr, such that Q' is obtained by adding to the body of 

^"^For the notation, please see proof of Proposition lE.il 



Q a new Pi-subgoal s{Pi) (possibly alongside other sub- 
goals), where s{Pi) corresponds to conjunct pi{Yi, Zi) 
in the consequent ifi of a. Suppose that Yi is not a su- 
perkey of Pi. Then under bag-set semantics, the chase 
step Q ^"bs Q' ''^sing a is not sound. □ 

(Proposition IE. 31 is formulated for the case of bag-set 
semantics, which allows us to show the flav or o f the 
proofs that are required to establish Theorem 14.31 ) 

Proof. fProposition lE.3p Given the assumption that 
for the conjunct PiiYi, Zi) used in tp, it holds that Yi is 
not a sup erkey of Pi, the proof of the claim of Proposi- 
tion [E]3] is by providing a sef-valued database D, such 
that D satisfies E and such that Q{D, BS) and Q'{D, BS) 
are not the same bags. 

Fix i such that Yi in Pi{Yi, Zi) is not a superkey of Pi 
and such that Pi{Yi, Zi) corresponds to a subgoal in Q' 
that (subgoal) is not in Q, in chase step Q Q' ■ ^® 
begin the construction of the database D by building 
the canonical database D' for query Q' . We obtain D 
by adding to D' a single extra (nonduplicate) tuple for 
the subgoal s{Pi) of Q' , as follows. 

Without loss of generality, let pi{Yi,Zi) be of the form 
Pi{Yi, Z'^, ZI'), where Z'^ is not empty, Yi[jZ^ is a su- 
perkey of Pi , and no proper subset of Yi [J Z[ is a su- 
perkey of Pi. Now suppose v was the mapping used 
to generate s{P.i) from Z^', Z-') in the chase step 

Q Q' . (See proof of Proposition lE . II for the details 
on V.) Then s{Pi) is of the form pi{A,C, E), where A 
(C, E, respectively) is the image of Yi (of of Z", 
respectively) under u. By construction of the canonical 
database D' of Q', the tuple for s{Pi) in relation Pi in 
D' is (a, c, e). We construct the database D from D' by 
adding to Pi of D' a tuple (a, c',e), such that at least 
one constant in c' is not equal to the same-position con- 
stant in c. By construction, database D is set valued 
and satisfies the dependencies E. (Recall that no proper 
subset of Yi y Z- in Pi{Yi, Z[, Z'-) is a superkey of Pi.) 

We now follow the reasoning in the proof of Propo- 
sition |KT1 to observe that the bag Q'{D,BS) has at 
least one more tuple than the bag Q(D, BS). The rea- 
son is, tuples (a, c, e) and (a, c',e) in relation Pi add 
to Q'{D,BS) an extra copy of at least one tuple in 
Q{D, BS). This observation concludes the proof. □ 

We now provide an illustration that shows the main 
points of the proof of Proposition lE. 31 

EXAMPLE E.2. Consider a set ^ {a} of embed- 
ded dependencies, where 

a:riX,Y)^p{X, Z). 

Given a CQ query Q, 

Q{A) ■.-r{A,B). 

applying a to the query Q results in query Q' , 
Q'{A) :- r{A,B),p{A,C). 

Observe that chase step Q =^^5 Q' using a is not key- 
based, as the set of all attributes of P is the only key of 
P. 

We now illustrate the con struc tion of the database D 
in the proof of Proposition \E.3[ First, the canonical 



database D' of Q' has relations R = {(a, 6)} and P = 
{(a,c)}. D is constructed from D' by adding to relation 
P a new tuple (a, d), that is D has relations R = {(a,b)} 
and P = {{a,c),{a,d)}. Note that database D is set 
valued and satisfies the dependency a. 

Now by the bag-set semantics for query evaluation, 
Q{D,BS) = {{(a)}}, whereas Q'{D,BS) = {{(a), (a)}}. 
Thus, database D is a counterexample to Q =^„y gg Q' , 
which proves that the chase step Q Q' using g is 

not sound. □ 

F. COUNTEREXAMPLE DATABASE FOR 
EXAMPLE 5.1 

This section of the appendix provides the counterex- 
ample database for Example ??. Database £> is a coun- 
terexample to soundness of chase step Q^l^ ■ In 

D, let the relations be as follows: P = {{(1, 2)}}, R = %, 
S = {{(1,3)}}, T = {{(1,4,5), (1,6,7))), and U = 9. 
Note that I? ^ S, for the set of dependencies S in Ex- 
ample ??. 

On this database D, the answer to Q4 is Qi{D, B) = 

{{(1)}}, whereas Qi'\D,B) = {{(1),(1)}}, by rules of 

bag semantics. From the fact that Q4,{D, B) and Q^P [D, B) 
are not the same bags, we conclude that the chase step 

(54 Q^'-p is not sound under bag semantics. 

G. UNIQUENESS THEOREMS FOR CHASE 
RESULTS 

We begin this section of the appendix by formulat- 
ing the version of Theorem 15.11 for the case of bag-set 
semantics. 

Theorem G.l. Given a CQ query Q and a set E 
of embedded dependencies on database schema V, such 
that there exists a chase result {Q)t,,s for Q and E un- 
der set semantics. Then there exists a result (Q)e.bs 
of sound chase for Q and S under bag-set semantics, 
unique up to isomorphism of its canonical representa- 

tion\^ That is, for two sound-chase results {Q)'^'^gg 

and ((3)s,i3s for Q and E, (Q)s^s5 =bs (Q)L%s 
the absence of dependencies. □ 

We now provide a proof for Theorem 15.11 An adap- 
tation of the proof to the statement of Theorem IG.ll is 
straightforward. 

Proof. (Theorem lS.ip We first estabhsh that, by the 
definition of soundness of the chase result (Q)^^^? there 
exists a chase sequence Ci using E, such that Ci starts 
with Q and ends with {Q)'^'^g, and such that all chase 
steps in Ci are sound under bag semantics. Similarly, 
we establish that there exists a chase sequence C2 using 

E, such that C2 starts with Q and ends with {Q)^ q, 
and such that all chase steps in C2 are sound under bag 
semantics. 

The proof of Theorem 15.11 is by contrapositive. As- 
sume, toward contradiction, that (Q)^^b ^'^^ b 
are not isomorphic after removal of duplicate subgoals 

^■*See Theorem O in Section [Q] 



that correspond to set-valued relations in the database 
schema V. Let us denote by (Q)^ b result of remov- 
ing such "set- valued" duplicate subgoals from (Q)^ 
and let us use the analogous notation (Q)^ b {Q)i: b- 
Suppose, w.l.o.g., that (Q)^ s has a nonempty set 

of subgoals pi[Xi), . . . ,pm{Xm) such that this set of 
subgoals does not have a counterpart in the image of 

any injective homomorphism from {Q)'^^ ^'^ s- 
It is clear that pi(Xi), . . . ,Pm{Xm) cannot be a subset 
of all the subgoals in the body of the original query Q. 
(By definition of sound chase steps, no chase steps using 
embedded dependencies ever remove original query sub- 
goals.) Then, from the sound chase sequence Ci, we can 
form a sequence C[ of sound chase steps that (i) uses a 
subsequence of the sequence of dependencies applied in 
Ci , and (ii) starts with Q and ends with adding all the 
subgoals in pi(Xi), . . . ,PmiXm) to Q. By definition of 
sound chase, there must exist a nonempty suffix subse- 
quence C" of C[ such that all chase steps in C" apply to 

the chase result (Q)^ b chase sequence C2, and such 
that applying the respective (to C") dependencies in E 
to (Q)^ B would result in adding to (Q)^ b ^ ^f sub- 
goals that would be an image oipi{Xi), . . . ,Pm{Xm.) in 
some injective homomorphism from (0)^ s (Q)^^- 
We thus arrive at a contradiction with the condition of 

("2) 

Theorem 15.11 which states that (Q)^ b ^ (terminal) 
result of sound chase for Q using E under bag seman- 
tics. (That is, the contradiction is with the assumption 

f 21 

that no sound chase steps of the form (Qy^: b Q' 

are possible, where cr G E.) 

The case where some of pi{Xi), . . . ,pm{Xm) were 
f 21 

eliminated in (Q)^^ b ^^'^ ^^'^ '^^ more egds in E is 
analogous to the above tgd case, except that the con- 
tradiction in the case of egds is with our assumption 

that ((5)s B is a (terminal) result of sound chase for Q 
using E under bag semantics. That is, those same egds 
can be applied to (Q)^ bj hence (Q)^^s "^o* ^ result 
of sound chase under bag semantics. □ 

H. COMPLEXITY OF SOUND CHASE 

In this section of the appendix, we establish for The- 
orem 15.21 the lower bound on the complexity of sound 
chase under each of bag and bag-set semantics, using 
sets of weakly acyclic dependencies. 

H.l Weakly Acyclic Dependencies 

We provide here the definition and discussion of [TT] 
for weakly acyclic dependencies. 

The chase-termination property under set semantics 
is in general undecidable for CQ queries and dependen- 
cies given by tgds and egds. However, the notion of 
weak acyclicity of a set of dependencies is sufficient to 
guarantee that any chase sequence terminates. This is 
the least restrictive sufficient termination condition that 
has been generally studied in the literature (but see [10] 
for a generalization) . The weak acyclicity condition ap- 
pears to hold in all practical scenarios. 



Definition H.l. (Weakly acyclic set of depen- 
dencies ) Let be a set of tgds over a fixed schema. 
Construct a directed graph, called the dependency graph, 
as follows: (1) there is a node for every pair (i?, A), with 
R a relation symbol of the schema and A an attribute 
of R; call such pair {R,A) a position; (2) add edges as 
follows: for every tgd (f>{X) — > 3Y ij{X, Y) in T, and 
for every X in X that occurs in ip: 

For every occurrence of X in 4> in position {R,Ai): 

(a) for every occurrence of X in ip in position {S,Bj), 
add an edge {R, Ai) {S,Bj); 

(b) in addition, for every existentially quantified vari- 
able Y and for every occurrence ofY in ijj in posi- 
tion (T, Ck), add a special edge (i?, Ai) (T, Ck). 

Note that there may be two edges in the same direction 
between two nodes, if exactly one of the two edges is 
special. Then E is weakly acyclic if the dependency 
graph has no cycle going through a special edge. We 
say that a set of tgds and egds is weakly acyclic if the 
set of all its tgds is weakly acyclic. □ 

Theorem H.l. JTH [77| / If T, is a weakly acyclic set 
of tgds and egds, then the chase with E of any CQ query 
Q under set semantics terminates in finite time. □ 

The complexity of the chase. For a fixed database 
schema and set E of dependencies, if E is weakly acyclic 
then under set semantics any chase sequence terminates 
in polynomial time in the size of the query being chased 
(as shown in [T^ITJ). The fixed-size assumption about 
schemas and dependencies is often justified in practice, 
where one is usually interested in repeatedly reformulat- 
ing incoming queries for the same setting with schemas 
and dependencies. Nonetheless, the degree of the poly- 
nomial depends on the size of the dependencies and care 
is needed to implement the chase efficiently. Successive 
implementations have shown that in practical situations 
the chase is eminently usable [TT] . 

The complexity of reformulation under set se- 
mantics (in C&B). Assume that under set semantics 
the chase of any query with E terminates in polyno- 
mial time (for fixed database schema). Then checking 
whether a CQ query Q admits a reformulation is NP- 
complete in the size of Q. Checking whether a given 
query Q' is a E-minimal reformulation of Q is NP- 
complete in the sizes of Q and Q' . For arbitrary sets of 
dependencies (for which the chase may not even termi- 
nate), the above problems are undecidable. 

H.2 The Lower Complexity Bound 

We now establish for Theorem 15.21 the lower bound 
on the complexity of sound chase using weakly acyclic 
dependencies under each of bag and bag-set semantics, 
as follows. 

EXAMPLE H.l. On a database schema V = 
{Pi, P2, . . . , Pm} where each relation symbol has arity 
2, consider a query Q with a single subgoal pi : 

Q{X,Y) ■.-p,{X,Y). 

Suppose the database schema V satisfies a set E of 
tgds of the following form: 

a'^j : p^{X,Y)^ 3Zp,{Z,X) 
ag : p,{X,Y)^ 3W p,{Y,W) 



E has one tgd a^^j and one tgd for each pair {i,j), 

wherei G {1, . . . ,m—\\ and j £ {i-|-l, . . . , m}. Thus, 
the number of dependencies in E is quadratic in m. 

We show one partial chase result (under set seman- 
tics) of the query Q under dependencies E, for m > 2: 

Q'{X,Y) ■.-pi{X,Y), P2{Zi,X), P2{Y,Z2). 

Q' is the result of applying to Q tgds a[^l and a^l . 
Observe that Q' has a self-join of the relation P2. □ 

For the terminal result (Q)s,5 of chase of the que ry Q 
using the tgds E under set semantics in Example IH.li 
we can show that the size of {Q)t.,s is exponential in 
the size of Q and E. Specifically, the size of {Q)y.,s is 
exponential in the size m of the database schema T). 
Intuitively, just as Q' has two subgoals for predicate 
the query {Q)s,s has two subgoals for p2, four subgoals 
for p3, and so on. 

EXAMPLE H.2. We continue Example \HJ\ We 
build a set E' of dependencies from the set E of Ex- 
amvle Un] by adding 3m functional dependencies (fds): 
For each i £ {1, . . . , m}, we add the following three fds 
for the relation Pi in T): 

af^ : p,{X,Y) A p,{X,Z)^ Y = Z 

af^ : p^{Y,X) A p^{Z,X)^ Y = Z 

: p,{X,Y,Z,) A p,{X,Y,Z2)^ ^1 = ^2 

That is, in all databases that satisfy the first two fds 
for i in E', the core-set of Pi does not have repeated 
values of either attribute. The third fd for Pi guaran- 
tees that relation Pi is set valued in all instances of the 
database schema V. Here, the third attribute of Pi is its 
tuple-id attribute. Please see AvvendixK^ for the details 
on using egds for enforcing set-valuedness of relations 
in all instances of a given database schema. 

Note that the add ition of these fds transforms the tgds 
E of Examvle \H.1\ into key-based tgds E' (see Defini- 
tion \5.1\) . Thus, for the terminal result (Q)-s' of sound 
chase of the query Q under the dependencies E' under 
bag semantics, the size of {Q)^>^b is exponential in the 
size of Q and E'. The same relationship holds under 
bag-set semantics between the size of {Q)^'^bs o,nd the 
sizes of Q and E. □ 

By the results of Section[4l chase of CQ query Q under 
key-based tgds E results in a query that is equivalent to 
Q under E under each of bag and bag-set semantics for 
query eval uation . Observing that the dependencies E' 
of Example IH . 2 1 are weakly acyclic (and, in fact, strictly 
acyclic) , completes the construction of the infinite fam- 
ily of pairs ((5,E'), one pair for each natural-number 
value of TO, such that the size of each of {Q)t.,b and 
{Q)t.,bs (both constructed using sound chase) is poly- 
nomial in the size of Q and exponential in size of E. 

I. SATISFIABLE DEPENDENCIES ARE 
QUERY BASED 

In this section of the appendix we p rovide Theorem lI.il 
which is the analog of Theorem l5.3l for the case of bag- 
set semantics. We the n su pply a proof of Theorem 15.31 
the proof of Theorem II. II is similar. Finally, we outline 
the counterpart of algorithm Max-Bag-E-Subset (of 
Section [53]) for the case of bag-set semantics. 



Theorem I.l. (Unique S™|^(Q,S) C S) Given a 
CQ query Q and set E of embedded dependencies, such 
that there exists a set-chase result {Q)j:^s for Q and E. 
Let Qn be the result of sound chase for Q and E un- 
der bag-set semantics, with canonical database D^^"\ 
Then there exists a unique subsetTi'^g^ (Q , E) o/E, such 
that: 

• ^ E™g^(Q,E), and 

• for each proper superset's' of YT^g^ [Q ,Yi) such that 
E' C E, D^S") h does not hold. □ 

We now turn to the proof of Theorem 15.31 We first 
observe that the process of sound chase of a CQ query 
using a set E of embedded dependencies under bag se- 
mantics can be modeled as state transitions for E, with 
certain conditions on the final state, which corresponds 
to obtaining the result of the chase. The ter mination 
conditions are formalized in Proposition II. 1) we first 
set up t he terminology required to formulate Proposi- 
tion [U] 

Suppose we are given a CQ query Q and a finite set 
E of embedded dependencies, such that there exists a 
sei-chase result {Q)t.,s for Q and E. Consider an ar- 
bitrary chase sequence C = Qo,Qi,---, such that (i) 
Qq = Q, and (ii) every query Qi+i (i > 0) in C is ob- 
tained from Qi by a sound chase step Qj ^ % Qi+i using 
a dependency cr G E. By Proposition [O] the chase se- 
quence C is finite, that is, C — Qa,Qi, ■ ■ ■ ,Qm such 
that S N U {0} and such that query Qn — {Q)t,.b- 
Moreover, by Theorem lS.ll we have that the query Qn is 
bag-equivalent in the absence of dependencies to the 
terminal queries in all sound-chase sequences for Q and 
E under bag semantics. 

Given a chase sequence C as defined above, with chase 
result Qn = (Q)s.s, we assign a unique ID to each sub- 
goal of Qn. We then "propagate the IDs back" to all the 
queries in C, so that the enumeration of the subgoals 
is consistent across all the elements of C. If extra sub- 
goals are encountered in non-terminal elements of C, 
we assign unique IDs to those subgoals as well. (The 
only case when a query Q^, i < rt, in C could have an 
extra subgoal compared to Qn is when the procedure 
of dropping duplicate subgoals has been appli ed to ei- 
ther Qi or its successors in C. See Theorems 12.11 14.11 
and 221) In what follows, we refer to the jth subgoal 

of query Qi as sj''' . 

Fix an arbitrary i G {0, . . . ,n}, and consider query 
Qi in the chase sequence C. Given an arbitrary depen- 
dency cr G E, of the form a : (t>{U , W) 3V i;{U, V), 
we define the state of a w.r.t. Qi in C as follows: 

• Dependency a is pre- applicable to Qi if the chase 
of none of Qo, Qi, . . . , Qi with a is applicable; that 
is, for each j G {0, . . . ,i}, there does not exist a 
homomorphism from the left-hand side of cr to 
the body of the query Qj. 

• Dependency a is soundly applicable to set of sub- 
goals S = {s^\\ . . . , sj'j?} of query Q^, for some 

fc > 0, if there exists a proper subset 9, of size 
k' > k, oi (j) A i/j (of cr), with the following proper- 
ties: 

— is a superset of (/>; 

Other than the se t-en forcins dependencies on stored rela- 
tions, see Theorem 15. II 



— there exists a homomorphism h from 9 to ex- 
actly the subgoals s^^-*, . . . , s^^ of query Qi, such 

that h cannot be extended to a homomorphism 
fro m (j)A ip to the body of the query Qi (see Sec- 
tion 12.41 for further details on this definition) ; 
and 

— chase step Qi Q', where Q' is a CQ query, 
is sound; that is, Q' =e,b Qi- 

• Dependency cr is unsoundly applicable to set of sub- 
goals S = {s^i , . . . , 5^*2} of query Qi, for some 
fc > 0, if there exists a proper subset 9, of size 
k' > k, oi (p Alp (of cr), with the following proper- 
ties: 

— is a superset of (/>; 

— there exists a homomorphism h from 9 to ex- 
actly the subgoals s^\^ , . . . , sj^ of query Qi , such 

that h cannot be extended to a homomorphism 
fro m (f)A ijj to the body of the query Qi (see Sec- 
tion 12.41 for further details on this definition) ; 
and 

— chase step Qi Q' , where Q' is a CQ query, 
is unsound; that is, Q' =s,b Qi does not hold. 

• Finally, dependency a is post-applicable to Qi (as- 
suming i > 0) if (a) a is neither soundly applicable 
nor unsoundly applicable to Qi, and (b) there ex- 
ists a.j G {0, . . . , i — 1} such that a has been used in 
a sound chase step Qj Qj-i-i- Observe that in 
this case, by definition of (sound) chase steps there 
exists a homomorphism from the conjunction of the 
left-hand side of cr with the right-hand side ip of 
cr to the body of the query Qi. 

In the above definition of the state of cr G E w.r.t. Qi 
in C, the only difference between the states "soundly 
applicable" and "unsoundly applicable" is the soundness 
property of the chase step in question. Specifically, in 
the state "cr is soundly applicable to Qi" the chase step 
Qi Q' is sound under bag semantics, whereas in the 
state "cr is unsoundly applicable to Qi", the chase step 
Qi Q' unsound. 

We now define the state of the set of embedded depen- 
dencies E w.r.t. Qi in C, as a total mapping sp from S 
to the set of the four above states (pre-applicable, post- 
applicable, soundly-applicable, and unsoundly-applicable), 
where the state sp(cr) of each cr G E w.r.t. Qi in C is 
as follows: 

• sp(cr) = "soundly-applicable" if and only if there 
exists a set S of subgoals of Qi such that cr is 
soundly applicable to S in Qi; 

• sp (cr) = "unsoundly-applicable" if and only if there 
exists no subset 5* of subgoals of Qi such that cr is 
soundly applicable to S in Qi, and there exists a 
set S' of subgoals of Qi such that a is unsoundly 
applicable to S' in Qi; 

• sp(cr) = "post-applicable" if a and Qi satisfy the 
conditions (a) and (b) of post-applicability, see above; 
and 

• sf{a) = "pre-applicable" if a and Qi satisfy the 
conditions of pre-applicability, see above. 

We now establish straightforward facts about the states 
of E w.r.t. particular queries in the sound-chase se- 
quence C = Qo, . . . , Qn, where Q„ is the result of the 



sound chase of Q using S under bag sem antics. The 
proofs of all the claims in Proposition 11.11 are immedi- 
ate from the definitions in this section of the appendix 
and from the definitions of chase steps, see Section ^TM 

Proposition I.l. For a CQ query Q and a set of 
embedded dependencies E such that there exists a set- 
chase result ((3)s,5 for Q and T,. Let C = Q07 ■ ■ ■ j Qn 
be a sound-chase sequence for Q and S under hag se- 
mantics. In C, Qq — Q, and Qn is the result {Q)t.,b 
of the sound chase of Q using S under bag semantics. 
Then the following holds about the states of S w.r.t. 
queries in C. 

1. Suppose that in the state s^ ofH w.r.t. query Qq in 
chase sequence C, for all a G it holds that s^{a) 
is either "pre- applicable" or "unsoundly- applicable". 
Then C = Q^. That is, Q is isomorphic to {Q)s,b- 

2. Consider the state s^ ofS w.r.t. query Qn in chase 
sequence C. Then for all a £ it must hold that 
s^((t) is one of "pre- applicable", "post- applicable", 
and "unsoundly- applicable". 

3. For an arbitrary i € {0, . . . , 71 — 1} (assuming n > 
0), consider the state sp of H w.r.t. query Qi and 
the state sf^^i of Tj w.r.t. query Qi-^-i. Then 

(a) there must exist a cr* G E such that sp(cr*) is 
"soundly applicable", and 

(b) for each a G (E - {a*}), sf{a) = sf_^^{a). □ 

We are now ready to prove Theorem l5.3l 

Proof. (Theorem [Q)) Consider a fixed pair (Q, E) 
satisfying the conditions of Theorem 15.31 and let Qn 
be the result of sound chase for Q and E under bag 
semantics, with canonical database D^"). We show 
that the set E^°^((5, E) is the result of removing from E 
exactly those tgds a such that the chase step Q„ Q' , 
with some CQ query Q' being the outcome of the chase 
step, is unsound under bag semanti cs. This claim is, 
in fact, immediate from Proposition II. 1[ in which it is 
shown that, for each dependency cr in E such that a is 
applicable to Qn, cr is unsoundly applicable to Qn- □ 

Finally, we outline the cou nter part of algorithm Max- 
Bag-E-Subset (of Section for the case of bag-set 
semantics. 



Algorithm 2: Max-Bag-Set- E-Subset(Q, E) 


Input : CQ query Q, set E of embedded dependencies 




such that chase result (Q)s,s exists 


Output: E5^§^(Q,E) C E s. t. 




(1) d((«)s,bs) 1= ES^§"(Q,E), and 




(2) V E' such that Ess''(Q, E) C E' C E, 




]j({Q)t.,bs) J]' 


1 


• {Q)t; BS '.= soundChase{BS,Q,'E); 


2 


. ESr(Q,S) := E; 


3 


. for each ainYl do 




4. if soundChaseStep{a, BS, {Q)s.Bs) = false 




then 




L5. E^r(Q,S) := ESr(Q,S)-M; 


6 


. return E^r(Q,E); 



The correctness and complexity results for Max-Bag- 
Set-E-Subset are the same as their counterparts for 



algorithm Max-Bag-E-Subset, see Theorem and 
Section [5T51 for the details. 

J. PROOFS OF E-EQUIVALENCE-TESTS 
FOR CQ QUERIES 

To prove Theorems 16.11 and Theorem 16.21 we first 
make a straightforward observation, as follows. 

Proposition J.l. Given two queries Q and Q' and 
a set of embedded dependencies S. Let X be one of B, 
BS, S, which stand for bag, bag-set, and set semantics, 
respectively. Then Q =x Q' implies Q =t.,x Q' ■ ^ 

The proof of Proposition IJ.ll is straightforward from 
the definitions of query equivalence in presence and in 
the absence of dependen cies. 

The proof of Theorem 16.11 is im medi ate from Propo- 
sitio ns [QI a nd IJ.ll from Theorem 15. 1| and from Lem- 
mas IJ.ll and IJ.2I Similarly, the proof of Theorem 16.21 
is immediate from Propositions 15.11 and IJ.ll from the 
analo g of Theorem 1 5 . 1 1 for bag-set semantics (see The- 
ore m ICTTIl . and from straightforward analogs of Lem- 
mas IJ.ll and IJ.2I for the case of bag-set semantics for 
query evaluation. 

Lemma J.l. Given CQ queries Q and Q' , and given 
a set of embedded dependencies E on schema T> such 
that there exist set- chase results (Q)s,5 for Q and (Q')e,s 

for Q'. Then Q =y.^b Q' implies {Q)t,,b =b {Q')t,,b 
in the absence of all dependencies other than the set- 
enforcing dependencies on T>. □ 

Proof. First, from Proposition 15.11 we obtain that 
sound chase of each of Q and Q' using E is guaranteed 
to te rmin ate under bag semantics. Further, from The- 
orem [S?T] it follows that there exist (1) a unique result 
(Q)s,B of sound chase for Q, and (2) a unique result 
{Q')y.,b of sound chase for Q' . Both results are unique 
in the absence of all dependencies other than the set- 
enforcing dependencies on T>, call these set-enforcing 
dependencies E' C E. 

From Q =s,_b Q' and by the soundness of chase in 
obtaining {Q)y.,b and (Q')s,b, we have {Q)t,,b =s,s 
(Q')s.B- That is, on each bag- valued database D that 
satisfies E, we have that Q{D, B) and Q'{D, B) arc the 
same as bags. 

To show that {Q)s,b =b {Q')t.,b in the absence of 
all dependencies other than E', it remains to prove that 
Q{D, B) and Q'{D, B) are also the same as bags on each 
database D that does not satisfy E but does satisfy E'. 
There are two cases: 

Case 1: Suppose D violates only those dependen- 
cies that are not relevant in sound chase to either Q 
or Q' . (In the terminology of Section [H those would 
be exactly the dependencies that are pre-applicable to 
each of {Q)t.,b and {Q')s.b ) In this case, D does not 
violate any dependencies as far as {Q)t,,b or {Q')y,.b 
are concerned, as formalized in Theorem 15.31 Thus 
from {Q)t,.b =s,b {Q')s,b we obtain that Q{D, B) and 
Q'{D, B) are the same as bags on D. 

Case 2: Suppose D violates at least one dependency 
that is relevant in sound chase to either Q or Q' . (In 
the terminology of Section^ those would be exactly the 
dependencies that are post-applicable to each of (Q)e,_b 
and ((3')e,b-) Still, by Theorem 15.31 the definitions of 



{Q)t.,b and of {Q')t.,b ensure that all such relevant de- 
pendencies are enforced (i.e., do not fail) on all assign- 
ments 7 that satisfy each of {Q)t.,b and {Q')t.,b w.r.t. 
D. Let Dq be the union of all tuples in all such sat- 
isfying assignments for {Q)t,.b w.r.t D\ Dqi is defined 
analogously for {Q')t.,b- Then D' = Dq[^Dqi satis- 
fies all the dependencies of E that are relevant in chase 
to either Q or Q' . Thus, from (Q)s,b =s,b {Q')t.,b 
we obtain that Q{D', B) and Q'{D' , B) are the same as 
bags. From the fact that none of the tuples of D that 
are not in D' participates in forming either Q{D, B) or 
Q'{D, B), it follows that Q{D, B) and Q'{D, B) are the 
same as bags on database D. □ 

Lemma J. 2. Given CQ queries Q, Q' , and given em- 
bedded dependencies E on schema T) such that there ex- 
ist set-chase results (Q)s,s for Q and ((5')s,s for Q' . 
Then Q =s,s Q' holds whenever {Q)t.,b =b {Q')t.,b 
in the absence of all dependencies other than the set- 
enforcing dependencies on V. □ 

The proof of Lemma IJ.2I is immediate from the fact 
that each of {Q)y.,b and {Q')t.,b was obtained using 
sound chase steps under bag semantics (which implies 
{Q)t.,b =t.,b Q and (QQs.b =e,b Q'), as well as from 
Propositions 15.11 and IJ.ll and from transitivity of bag 
equivalence in presence of dependencies. 

K. E-BASED VERSION OF PROP. 2.1 

In this appendix we provide the proof of Proposi- 
tion l6.ll w hich is the dependency-based version of Propo- 
sition [2Tl] ([4|, see Sect ion [2T3I of this current paper). By 
Theorems 16. II and 16. 21 t he p roof works both for the for- 
mulation of Propositi onlHTD and for the for mulat ion that 
parallels Proposition 12.11 (sec Proposit ion IK.ll below.) 
We also provide a proof of Prop ositi on 16. 21 Finally, we 
provide the analogs of Theorem 16.41 for (a) CQ queries 
under bag-set semantics, and for (b) CQ queries with 
grouping and aggregation. 

Proof. (Proposition [nH]) 
Proof of (1): Assume 

Q =s,i3 Q'- (18) 

or, equivalently (by Theorem 16. ip . assume 

(Q)e,s =B {Q'h,B (19) 

in the absence of all dependencies other than the set- 
enforcing dependencies of the given database schema. 
Then Equation [20] 

(Qh.B ^BS {Q'h.B- (20) 

follows from Equation [19] by Proposition 12.11 Equa- 
tion [H] 

{Qh,B =S,BS (0')S,B. (21) 

follows from Equation [50] by Proposition IJ.ll Equa- 
tion [22] 

{{Q)^,b)^,bs =bs ((Q')s,b)s,bs- (22) 
follows from Equation [21] by Theorem [62] Equation [23] 

{Qh,BS =BS {Q'h.BS. (23) 

follows from Equation [22] for the following reasons: 



• By Proposition 15.21 (also see Theorem 14.11 and the 

definitions of chase steps), the set Ei C E of de- 
pendencies that are soundly applicable to a query 
under bag semantics is a subset of the set E2 C E 
of dependencies that are soundly applicable to the 
same query under bag-set semantics. 

• From Theorem 15.11 an d its analog for bag-set se- 
mantics (Theorem IG.ll) . it follows that 
((Q)s,b)s,bs =bs {Qh,BS, and similarly 
((Q')s,b)s,bs =bs {Q')y.,bs ■ 

• By transitivity of =_bs, we obtain Equation 1231 
Finally, Equation [Ml 

Q =s,BS Q'- (24) 
follows from Equation [23] by Theorem [62] 

Proof of (2): Assume 

Q =^,BS Q'- (25) 
or, equivalently (by Theorem 16. 2p . assume 

{QHbs =bs {Q'h,BS. (26) 
Then Equation [27] 

{Qh,BS =S {Q'h.BS- (27) 

follows from Equation [26] by Proposition 12.11 Equa- 
tion [28] 

[Qh.BS =s,s {Q'Hbs- (28) 

follows from Equation [57] by Proposition IJ.ll Equa- 
tion [29] 

HQh.Bsh.s =s {{Q'h.Bsh.s- (29) 
follows from Equation [55] by Theorem l2.21 Equation [501 
{Qh,s =s {Q'h,s- (30) 
follows from Equation [29] for the following reasons: 

• By Proposition 15.21 (also see Theorem 14.31 and the 

definitions of chase steps), the set Ei C E of depen- 
dencies that are soundly applicable to a query un- 
der bag-set semantics is a subset of the set E2 C E 
of dependencies that are (always soundly) applica- 
ble to the same query under set semantics. 

• From the analog of Th eorem 15.11 for bag-set se- 
mantics (Theorem IG.1|) and from the definitions 
of chase steps, it follows that {{Q)s,bs)s,s =s 
(Q)s,s, and similarly ((Q')s,bs)e,S =S {Q'h,S ■ 

• By transitivity of =5, we obtain Equation 1301 
Finally, Equation [311 

Q =s,5 Q'- (31) 

follows from Equation [3D] by Theorem l2.21 □ 

Proposition K.l. Given two CQ queries Qi and Q2, 
and a set of embedded dependencies E, such that there 
exists the set-chase result in chase of each ofQi and Q2 
using E. Then (1) Qi =y,,b Q2 implies Qi =s,bs Q2, 
and (2) Qi =s,bs Q2 implies Qi ^s.s Q2- ^ 

We next provide a proof of Proposition 16. 21 



Proof. fProposition l6.2p Con side r a pair {Q, E) that 
satisfies conditions of Theo rem 15.31 By definition of 
chase steps (see Section [^^ . in an arbitrary sei-chase 
sequence C = Q,Qi,... for Q and E, for each ele- 
ment Qi of C such that Qi+i is also an element of 
C, it holds that Qi+i Qi in the absence of de- 
pendencies. (Also, trivially, for each CQ query Q it 
holds that Q \Zg Q.) By transitivity and reflexivity 
of C5, for an arbitrary pair {Qi,Qi+j) (for j > 0) of 
elements of C, it holds that Qi+j Es Qi- By defini- 
tion of sound chase under bag and bag-set semantics 
(see Section 14]), the same set- containment relationship 
Qi+j Es Qi holds for an arbitrary pair {Qi,Qi+j) (for 
j > 0) of elements of a sound-chase sequence C un- 
der bag or bag- set semantics. The rest of the proof of 
Proposition 16.21 is immediate from the result of Propo- 
sition that E^'^^(g,E) C E^g^(g,E) C E for the 
above fixed pair ((5,E) and from Proposition [63] □ 

We now provide the analog of Theorem 16.41 for CQ 
queries under bag-set semantics. 

Theorem K.l. Given CQ query Q and set E of em- 
bedded dependencies such that set chase of Q under E 
terminates infinite time. Then Bag-Set-C&B returns 
all Yi-minimal reformulations Q' such that Q' =5] bs Q- 

□ 

Finally, we provide the analog of Theorem l6.4l for CQ 
queries with grouping and aggregation. 

Theorem K.2. Given CQ query Q with aggregate 
function max, min, sum, or count, and set E of em- 
bedded dependencies such that set chase of the core of 
Q under E terminates in finite time. Then (1) If the 
aggregate function of Q is max or min, then Max- 
MlN-C&B returns all Ti-minimal reformulations Q' of 
Q such that Q' =e Q; (2) If the aggregate function of 
Q is sum or count, then SUM-COUNT-C&B returns all 
Ti-minimal reformulations Q' of Q such that Q' =s Q. 
□ 



